[00:00:42] (03PS1) 10BBlack: cache_upload: swift conns limit -> 10K [puppet] - 10https://gerrit.wikimedia.org/r/310177 [00:00:52] (03CR) 10BBlack: [C: 032 V: 032] cache_upload: swift conns limit -> 10K [puppet] - 10https://gerrit.wikimedia.org/r/310177 (owner: 10BBlack) [00:02:34] PROBLEM - Esams HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 30.00% of data above the critical threshold [1000.0] [00:03:44] PROBLEM - Codfw HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [1000.0] [00:04:05] PROBLEM - Ulsfo HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [1000.0] [00:05:51] PROBLEM - Varnishkafka log producer on cp2017 is CRITICAL: PROCS CRITICAL: 0 processes with command name varnishkafka [00:08:15] PROBLEM - puppet last run on cp1099 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 5 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[retry-load-new-vcl-file] [00:15:37] RECOVERY - puppet last run on cp1099 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [00:16:01] !log aaron@tin Synchronized php-1.28.0-wmf.18/includes/jobqueue/utils/PurgeJobUtils.php: 0419831bbae66bda1e4dca8d690b607b493b2f5f (duration: 00m 47s) [00:16:07] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [00:17:06] !log aaron@tin Synchronized php-1.28.0-wmf.18/extensions/GeoData: 2dedca3789266b53cbde5bd88913c01175926974 (duration: 00m 48s) [00:17:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [00:21:18] PROBLEM - puppet last run on db2054 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [00:23:10] RECOVERY - Varnishkafka log producer on cp2017 is OK: PROCS OK: 1 process with command name varnishkafka [00:25:48] PROBLEM - Varnish HTTP upload-backend - port 3128 on cp1073 is CRITICAL: Connection refused [00:25:49] PROBLEM - cassandra-b CQL 10.192.32.138:9042 on restbase2004 is CRITICAL: Connection refused [00:28:12] RECOVERY - Varnish HTTP upload-backend - port 3128 on cp1073 is OK: HTTP OK: HTTP/1.1 200 OK - 178 bytes in 0.023 second response time [00:41:19] (03PS1) 10BBlack: Revert "Revert "cache_upload: switch to file storage backend on Varnish 4"" [puppet] - 10https://gerrit.wikimedia.org/r/310182 [00:41:45] (03CR) 10BBlack: [C: 032 V: 032] Revert "Revert "cache_upload: switch to file storage backend on Varnish 4"" [puppet] - 10https://gerrit.wikimedia.org/r/310182 (owner: 10BBlack) [00:46:20] RECOVERY - puppet last run on db2054 is OK: OK: Puppet is currently enabled, last run 35 seconds ago with 0 failures [00:50:28] !log reverting eqiad to file storage [00:50:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [00:50:40] heh, that was uninformative :) [00:50:48] !log cache_upload: reverting eqiad to file storage [00:50:52] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [01:03:43] ACKNOWLEDGEMENT - cassandra-b CQL 10.192.32.138:9042 on restbase2004 is CRITICAL: Connection refused eevans Fubar (https://phabricator.wikimedia.org/T144826). [01:06:53] PROBLEM - Varnishkafka log producer on cp1048 is CRITICAL: PROCS CRITICAL: 0 processes with command name varnishkafka [01:14:37] !log aaron@tin Synchronized php-1.28.0-wmf.18/includes/jobqueue/utils/PurgeJobUtils.php: (no message) (duration: 00m 52s) [01:14:43] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [01:15:45] !log aaron@tin Synchronized php-1.28.0-wmf.18/extensions/GeoData: 2dedca3789266b53cbde5bd88913c01175926974 (duration: 00m 49s) [01:15:50] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [01:18:04] PROBLEM - Varnishkafka log producer on cp1050 is CRITICAL: PROCS CRITICAL: 0 processes with command name varnishkafka [01:19:57] RECOVERY - Esams HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [01:23:37] RECOVERY - Codfw HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [01:24:38] !log cache_upload: reverting codfw to file storage [01:24:43] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [01:27:26] RECOVERY - Eqiad HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [01:27:39] !log aaron@tin Synchronized php-1.28.0-wmf.18/extensions/GlobalUsage: 1843a856c7c648a310a1983d53139e5b79ea6585 (duration: 00m 48s) [01:27:44] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [01:29:13] RECOVERY - Ulsfo HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [01:29:27] RECOVERY - Varnishkafka log producer on cp1048 is OK: PROCS OK: 1 process with command name varnishkafka [01:38:12] RECOVERY - Varnishkafka log producer on cp1050 is OK: PROCS OK: 1 process with command name varnishkafka [01:45:11] (03PS1) 10BBlack: upload frontends: hfp limit change 50MB -> 8MB [puppet] - 10https://gerrit.wikimedia.org/r/310185 [01:47:54] (03CR) 10BBlack: [C: 032] upload frontends: hfp limit change 50MB -> 8MB [puppet] - 10https://gerrit.wikimedia.org/r/310185 (owner: 10BBlack) [01:55:51] (03PS1) 10Aaron Schulz: Lower $wgMaxUserDBWriteDuration to 3 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/310186 [01:56:10] (03PS2) 10Aaron Schulz: Lower $wgMaxUserDBWriteDuration to 3 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/310186 [01:56:14] (03CR) 10Aaron Schulz: [C: 032] Lower $wgMaxUserDBWriteDuration to 3 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/310186 (owner: 10Aaron Schulz) [01:56:40] (03Merged) 10jenkins-bot: Lower $wgMaxUserDBWriteDuration to 3 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/310186 (owner: 10Aaron Schulz) [01:58:10] !log aaron@tin Synchronized wmf-config/CommonSettings.php: Lower $wgMaxUserDBWriteDuration to 3 (duration: 00m 47s) [01:58:17] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [02:07:57] (03PS1) 10BBlack: upload frontends: hfp limit change 8MB -> 1MB [puppet] - 10https://gerrit.wikimedia.org/r/310187 [02:11:30] (03CR) 10BBlack: [C: 032] upload frontends: hfp limit change 8MB -> 1MB [puppet] - 10https://gerrit.wikimedia.org/r/310187 (owner: 10BBlack) [02:13:20] (03PS2) 10Andrew Bogott: puppettable: Eliminate most uses of instance_id [puppet] - 10https://gerrit.wikimedia.org/r/310148 [02:15:59] (03CR) 10Andrew Bogott: [C: 032] puppettable: Eliminate most uses of instance_id [puppet] - 10https://gerrit.wikimedia.org/r/310148 (owner: 10Andrew Bogott) [02:18:43] (03PS2) 10Andrew Bogott: puppet tab: Rely less on fqdns and instance objects [puppet] - 10https://gerrit.wikimedia.org/r/310149 [02:18:45] (03PS2) 10Andrew Bogott: Puppet tab: A tiny bit of error handling [puppet] - 10https://gerrit.wikimedia.org/r/310150 [02:19:56] (03CR) 10jenkins-bot: [V: 04-1] puppet tab: Rely less on fqdns and instance objects [puppet] - 10https://gerrit.wikimedia.org/r/310149 (owner: 10Andrew Bogott) [02:21:00] (03CR) 10jenkins-bot: [V: 04-1] Puppet tab: A tiny bit of error handling [puppet] - 10https://gerrit.wikimedia.org/r/310150 (owner: 10Andrew Bogott) [02:22:36] (03PS3) 10Andrew Bogott: puppet tab: Rely less on fqdns and instance objects [puppet] - 10https://gerrit.wikimedia.org/r/310149 [02:22:38] (03PS3) 10Andrew Bogott: Puppet tab: A tiny bit of error handling [puppet] - 10https://gerrit.wikimedia.org/r/310150 [02:27:23] (03CR) 10Andrew Bogott: [C: 032] puppet tab: Rely less on fqdns and instance objects [puppet] - 10https://gerrit.wikimedia.org/r/310149 (owner: 10Andrew Bogott) [02:27:56] (03CR) 10Andrew Bogott: [C: 032] Puppet tab: A tiny bit of error handling [puppet] - 10https://gerrit.wikimedia.org/r/310150 (owner: 10Andrew Bogott) [02:39:42] !log mwdeploy@tin scap sync-l10n completed (1.28.0-wmf.18) (duration: 17m 41s) [02:39:47] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [02:42:55] 06Operations, 10Traffic, 13Patch-For-Review: Convert upload cluster to Varnish 4 - https://phabricator.wikimedia.org/T131502#2631544 (10BBlack) Since the above update: 1. Tried converting eqiad + codfw to `persistent`, and fighting through a bit on levels of 503s in case they were just from aggressive rollou... [02:45:16] 06Operations, 10Traffic, 13Patch-For-Review: Convert upload cluster to Varnish 4 - https://phabricator.wikimedia.org/T131502#2631547 (10BBlack) Also, after the various restarts above, I re-set (at runtime with varnishadm) all cache_upload backends to 31 seconds for `lru_interval` in the hope that that's stil... [02:46:54] !log l10nupdate@tin ResourceLoader cache refresh completed at Tue Sep 13 02:46:54 UTC 2016 (duration 7m 12s) [02:47:02] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [04:37:57] PROBLEM - IPv6 ping to eqiad on ripe-atlas-eqiad is CRITICAL: CRITICAL - failed 32 probes of 247 (alerts on 19) - https://atlas.ripe.net/measurements/1790947/#!map [04:39:47] (03PS2) 10KartikMistry: Deploy Compact Language Links out of beta for Tulu Wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/305997 [04:44:24] RECOVERY - IPv6 ping to eqiad on ripe-atlas-eqiad is OK: OK - failed 10 probes of 247 (alerts on 19) - https://atlas.ripe.net/measurements/1790947/#!map [06:29:07] (03CR) 10Alexandros Kosiaris: [C: 04-1] puppetmaster: rsync volatile and ca dirs between puppetmasters (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/310026 (owner: 10Giuseppe Lavagetto) [07:07:20] !log installing openjdk6 security updates [07:07:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [07:09:49] (03PS1) 10Jcrespo: dbsoftware: add s* hosts lists [software] - 10https://gerrit.wikimedia.org/r/310223 [07:11:14] (03CR) 10Marostegui: [C: 031] "very useful!!!" [software] - 10https://gerrit.wikimedia.org/r/310223 (owner: 10Jcrespo) [07:28:17] (03PS3) 10Muehlenhoff: Provide systemd units for keyholder-agent and keyholder-proxy [puppet] - 10https://gerrit.wikimedia.org/r/308132 (https://phabricator.wikimedia.org/T144043) [07:29:37] (03PS1) 10Ema: Merge branch 'master' into debian [software/varnish/libvmod-netmapper] (debian) - 10https://gerrit.wikimedia.org/r/310226 [07:30:04] (03CR) 10Ema: [C: 032 V: 032] Merge branch 'master' into debian [software/varnish/libvmod-netmapper] (debian) - 10https://gerrit.wikimedia.org/r/310226 (owner: 10Ema) [07:30:53] (03PS1) 10Ema: New upstream release [software/varnish/libvmod-netmapper] (debian) - 10https://gerrit.wikimedia.org/r/310227 (https://phabricator.wikimedia.org/T131502) [07:31:35] (03CR) 10Ema: [C: 032 V: 032] New upstream release [software/varnish/libvmod-netmapper] (debian) - 10https://gerrit.wikimedia.org/r/310227 (https://phabricator.wikimedia.org/T131502) (owner: 10Ema) [07:34:34] (03PS1) 10Urbanecm: Fix hewiki logos [mediawiki-config] - 10https://gerrit.wikimedia.org/r/310228 (https://phabricator.wikimedia.org/T145017) [07:50:43] !log remove kafkacat from stat1002 - not in puppet and causing cronspam (T132324) [07:50:47] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [07:51:37] (03CR) 10Jcrespo: [C: 032] dbsoftware: add s* hosts lists [software] - 10https://gerrit.wikimedia.org/r/310223 (owner: 10Jcrespo) [07:52:32] !log wrong package name for my prev entry - remove kafkatee from stat1002 - not in puppet and causing cronspam (T132324) [07:52:37] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [07:53:27] 06Operations, 10CirrusSearch, 06Discovery, 06Discovery-Search (Current work), 13Patch-For-Review: Upgrade elasticsearch and plugins to 2.3.5 - https://phabricator.wikimedia.org/T145404#2631890 (10Gehel) [07:57:05] (03CR) 10Gehel: [C: 032 V: 032] Upgrade elasticsearch plugins to 2.3.5 [software/elasticsearch/plugins] - 10https://gerrit.wikimedia.org/r/310038 (https://phabricator.wikimedia.org/T145404) (owner: 10DCausse) [07:57:53] (03PS1) 10Jcrespo: dbtools: add missing x1 hosts (in addition to s*.hosts) [software] - 10https://gerrit.wikimedia.org/r/310233 [07:58:52] (03CR) 10Jcrespo: [C: 032] dbtools: add missing x1 hosts (in addition to s*.hosts) [software] - 10https://gerrit.wikimedia.org/r/310233 (owner: 10Jcrespo) [07:58:56] (03CR) 10Jcrespo: [V: 032] dbtools: add missing x1 hosts (in addition to s*.hosts) [software] - 10https://gerrit.wikimedia.org/r/310233 (owner: 10Jcrespo) [08:04:50] !log upgrading elasticsearch & plugins to 2.3.5 on relforge - T145404 [08:04:55] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [08:05:51] (03CR) 10Muehlenhoff: [C: 032] Provide systemd units for keyholder-agent and keyholder-proxy [puppet] - 10https://gerrit.wikimedia.org/r/308132 (https://phabricator.wikimedia.org/T144043) (owner: 10Muehlenhoff) [08:08:18] (03CR) 10Giuseppe Lavagetto: puppetmaster: rsync volatile and ca dirs between puppetmasters (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/310026 (owner: 10Giuseppe Lavagetto) [08:08:59] (03PS2) 10Elukey: Add deployment-mediawiki04 to the deployment-prep scap dsh [puppet] - 10https://gerrit.wikimedia.org/r/310034 (https://phabricator.wikimedia.org/T144006) [08:11:48] 06Operations, 10Analytics: kafkatee's logrotate/syslog default pkg files needs to be removed - https://phabricator.wikimedia.org/T145490#2631941 (10elukey) [08:12:38] (03CR) 10Elukey: [C: 032] Add deployment-mediawiki04 to the deployment-prep scap dsh [puppet] - 10https://gerrit.wikimedia.org/r/310034 (https://phabricator.wikimedia.org/T144006) (owner: 10Elukey) [08:13:24] (03PS5) 10Giuseppe Lavagetto: puppetmaster: rsync volatile and ca dirs between puppetmasters [puppet] - 10https://gerrit.wikimedia.org/r/310026 [08:14:11] (03CR) 10Hashar: "Change is a noop on beta cluster deployment-tin (Trusty) \o/" [puppet] - 10https://gerrit.wikimedia.org/r/308132 (https://phabricator.wikimedia.org/T144043) (owner: 10Muehlenhoff) [08:14:45] PROBLEM - ElasticSearch health check for shards on relforge1002 is CRITICAL: CRITICAL - elasticsearch inactive shards 29835 threshold =0.1% breach: status: red, number_of_nodes: 2, unassigned_shards: 29829, number_of_pending_tasks: 402, number_of_in_flight_fetch: 0, timed_out: False, active_primary_shards: 282, task_max_waiting_in_queue_millis: 273758, cluster_name: relforge-eqiad, relocating_shards: 0, active_shards_percent_as_number [08:16:06] PROBLEM - ElasticSearch health check for shards on relforge1001 is CRITICAL: CRITICAL - elasticsearch inactive shards 29583 threshold =0.1% breach: status: red, number_of_nodes: 2, unassigned_shards: 29577, number_of_pending_tasks: 709, number_of_in_flight_fetch: 0, timed_out: False, active_primary_shards: 534, task_max_waiting_in_queue_millis: 358092, cluster_name: relforge-eqiad, relocating_shards: 0, active_shards_percent_as_number [08:16:13] Looking into it... [08:17:03] ACKNOWLEDGEMENT - ElasticSearch health check for shards on relforge1001 is CRITICAL: CRITICAL - elasticsearch inactive shards 29583 threshold =0.1% breach: status: red, number_of_nodes: 2, unassigned_shards: 29577, number_of_pending_tasks: 709, number_of_in_flight_fetch: 0, timed_out: False, active_primary_shards: 534, task_max_waiting_in_queue_millis: 358092, cluster_name: relforge-eqiad, relocating_shards: 0, active_shards_percent_a [08:17:03] ACKNOWLEDGEMENT - ElasticSearch health check for shards on relforge1002 is CRITICAL: CRITICAL - elasticsearch inactive shards 29835 threshold =0.1% breach: status: red, number_of_nodes: 2, unassigned_shards: 29829, number_of_pending_tasks: 402, number_of_in_flight_fetch: 0, timed_out: False, active_primary_shards: 282, task_max_waiting_in_queue_millis: 273758, cluster_name: relforge-eqiad, relocating_shards: 0, active_shards_percent_a [08:22:27] !log alter localuser table in https://tendril.wikimedia.org/host/view/dbstore2001.codfw.wmnet/3306 - T141951 [08:22:32] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [08:26:25] (03CR) 10Giuseppe Lavagetto: "https://puppet-compiler.wmflabs.org/4053/ shows the patch works correctly AFAICT" [puppet] - 10https://gerrit.wikimedia.org/r/310026 (owner: 10Giuseppe Lavagetto) [08:27:05] <_joe_> akosiaris: I'd like to merge the volatile/ca patch, if that's ok with you [08:30:50] (03CR) 10Alexandros Kosiaris: [C: 031] puppetmaster: rsync volatile and ca dirs between puppetmasters [puppet] - 10https://gerrit.wikimedia.org/r/310026 (owner: 10Giuseppe Lavagetto) [08:31:01] _joe_: ^ [08:31:14] <_joe_> cool [08:31:16] 06Operations, 10Cassandra, 06Services: Renew RESTBase self-signed root certificate authority - https://phabricator.wikimedia.org/T143044#2631990 (10fgiunchedi) 05Open>03Resolved a:03fgiunchedi Completed, both CAs for restbase production and staging cluster have been renewed and new certs issued. [08:31:33] <_joe_> now let's see how many fixups this needs [08:31:36] (03CR) 10Giuseppe Lavagetto: [C: 032] puppetmaster: rsync volatile and ca dirs between puppetmasters [puppet] - 10https://gerrit.wikimedia.org/r/310026 (owner: 10Giuseppe Lavagetto) [08:32:34] 06Operations, 06Performance-Team, 10Thumbor, 13Patch-For-Review: thumbor/exiftool deadlock, likely full pipe - https://phabricator.wikimedia.org/T144928#2632004 (10Gilles) 05Open>03Resolved [08:32:37] 06Operations, 06Performance-Team, 10Thumbor, 13Patch-For-Review: add thumbor to production infrastructure - https://phabricator.wikimedia.org/T139606#2632006 (10Gilles) [08:33:17] !log renaming tables in db1015 - T145487 [08:33:22] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [08:33:38] <_joe_> akosiaris: puppet is disabled on palladium [08:33:43] <_joe_> know anything about that [08:33:44] <_joe_> ? [08:35:37] <_joe_> akosiaris: I need to run puppet there, actually [08:35:46] _joe_: yes [08:36:00] _joe_: enable it freely [08:36:04] I am done debugging [08:36:04] <_joe_> ok [08:38:16] (03PS1) 10Ema: varnish: increase lru_interval [puppet] - 10https://gerrit.wikimedia.org/r/310240 [08:43:11] (03PS1) 10Alexandros Kosiaris: puppetmaster: Add the lost user.email git setting [puppet] - 10https://gerrit.wikimedia.org/r/310243 [08:43:13] (03PS1) 10Alexandros Kosiaris: puppetmaster: Specify To: header in private commit notification [puppet] - 10https://gerrit.wikimedia.org/r/310244 [08:43:42] (03PS2) 10Alexandros Kosiaris: puppetmaster: Add the lost user.email git setting [puppet] - 10https://gerrit.wikimedia.org/r/310243 [08:43:47] (03CR) 10Alexandros Kosiaris: [C: 032 V: 032] puppetmaster: Add the lost user.email git setting [puppet] - 10https://gerrit.wikimedia.org/r/310243 (owner: 10Alexandros Kosiaris) [08:43:56] (03PS2) 10Alexandros Kosiaris: puppetmaster: Specify To: header in private commit notification [puppet] - 10https://gerrit.wikimedia.org/r/310244 [08:44:01] (03CR) 10Alexandros Kosiaris: [C: 032 V: 032] puppetmaster: Specify To: header in private commit notification [puppet] - 10https://gerrit.wikimedia.org/r/310244 (owner: 10Alexandros Kosiaris) [08:45:11] (03PS1) 10Gilles: Upgrade upstream code to 0.1.19 [debs/python-thumbor-wikimedia] - 10https://gerrit.wikimedia.org/r/310245 [08:45:13] (03PS1) 10Gilles: Upgrade to 0.1.19 [debs/python-thumbor-wikimedia] - 10https://gerrit.wikimedia.org/r/310246 [08:46:42] !log relforge is taking more time than expected to recover after upgrade, most probably related to >10k indices that were created for test purpose [08:46:46] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [08:51:12] gehel: lol @ 10k indices [08:51:32] (03CR) 10Filippo Giunchedi: [C: 032] Upgrade upstream code to 0.1.19 [debs/python-thumbor-wikimedia] - 10https://gerrit.wikimedia.org/r/310245 (owner: 10Gilles) [08:51:37] (03CR) 10Filippo Giunchedi: [C: 032] Upgrade to 0.1.19 [debs/python-thumbor-wikimedia] - 10https://gerrit.wikimedia.org/r/310246 (owner: 10Gilles) [08:51:48] actually, it's >10k shards, only ~ 3k indices... [08:52:07] still, I can't blame elasticsearch too much for taking some time... [08:52:10] godog: it won't build, I've just seen that the test filename case change didn't make it into the commits because OS X sucks at this [08:52:17] two test cases fail [08:52:22] I'm fixing it right now [08:52:23] !log relforge is taking more time than expected to recover after upgrade, most probably related to ~3k indices that were created for test purpose [08:52:28] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [08:52:32] gilles: ah! ok still not merged [08:52:51] (03CR) 10Gilles: [C: 04-1] Upgrade upstream code to 0.1.19 [debs/python-thumbor-wikimedia] - 10https://gerrit.wikimedia.org/r/310245 (owner: 10Gilles) [08:52:55] (03CR) 10Gilles: [V: 04-1] Upgrade to 0.1.19 [debs/python-thumbor-wikimedia] - 10https://gerrit.wikimedia.org/r/310246 (owner: 10Gilles) [08:53:00] (03CR) 10Gilles: [C: 04-1] Upgrade to 0.1.19 [debs/python-thumbor-wikimedia] - 10https://gerrit.wikimedia.org/r/310246 (owner: 10Gilles) [08:53:48] gehel: indeed, recovery is throttled too at a few indices concurrently by default too isn't it? [08:54:32] godog: yep [08:56:20] (03CR) 10Muehlenhoff: "Nice, three more comments, at least two of those should be simple to resolve and the third can just as well be done in a follow up commit." (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/308520 (https://phabricator.wikimedia.org/T143536) (owner: 10Volans) [08:56:22] problem is that elasticsearch is so busy initializing shards that updating settings takes time... [08:58:55] (03PS2) 10Gilles: Upgrade upstream code to 0.1.19 [debs/python-thumbor-wikimedia] - 10https://gerrit.wikimedia.org/r/310245 [08:59:13] (03PS2) 10Gilles: Upgrade to 0.1.19 [debs/python-thumbor-wikimedia] - 10https://gerrit.wikimedia.org/r/310246 [09:00:18] godog: all good now [09:02:04] (03CR) 10Filippo Giunchedi: [C: 032] Upgrade upstream code to 0.1.19 [debs/python-thumbor-wikimedia] - 10https://gerrit.wikimedia.org/r/310245 (owner: 10Gilles) [09:04:04] (03PS3) 10Filippo Giunchedi: Upgrade upstream code to 0.1.19 [debs/python-thumbor-wikimedia] - 10https://gerrit.wikimedia.org/r/310245 (owner: 10Gilles) [09:05:25] 06Operations, 10Beta-Cluster-Infrastructure, 07HHVM, 13Patch-For-Review: Move the MW Beta appservers to Debian - https://phabricator.wikimedia.org/T144006#2632056 (10hashar) [09:07:08] (03PS3) 10Filippo Giunchedi: Upgrade to 0.1.19 [debs/python-thumbor-wikimedia] - 10https://gerrit.wikimedia.org/r/310246 (owner: 10Gilles) [09:07:13] (03CR) 10Filippo Giunchedi: [V: 032] Upgrade to 0.1.19 [debs/python-thumbor-wikimedia] - 10https://gerrit.wikimedia.org/r/310246 (owner: 10Gilles) [09:07:37] (03PS3) 10Gilles: Clean up Thumbor configuration [puppet] - 10https://gerrit.wikimedia.org/r/310024 [09:08:29] (03PS2) 10Elukey: Add deployment-mediawiki04 to the deployment-prep Varnish config [puppet] - 10https://gerrit.wikimedia.org/r/310035 (https://phabricator.wikimedia.org/T144006) [09:10:22] (03CR) 10Elukey: [C: 032] Add deployment-mediawiki04 to the deployment-prep Varnish config [puppet] - 10https://gerrit.wikimedia.org/r/310035 (https://phabricator.wikimedia.org/T144006) (owner: 10Elukey) [09:14:02] (03CR) 10Volans: "@moritzm, see my inline replies" (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/308520 (https://phabricator.wikimedia.org/T143536) (owner: 10Volans) [09:15:28] (03PS4) 10Gilles: Clean up Thumbor configuration [puppet] - 10https://gerrit.wikimedia.org/r/310024 [09:16:08] (03PS2) 10Filippo Giunchedi: Fix SLOW_PROCESSING_LIMIT [debs/python-thumbor-wikimedia] - 10https://gerrit.wikimedia.org/r/310015 (owner: 10Gilles) [09:16:15] (03CR) 10Filippo Giunchedi: [C: 032] Fix SLOW_PROCESSING_LIMIT [debs/python-thumbor-wikimedia] - 10https://gerrit.wikimedia.org/r/310015 (owner: 10Gilles) [09:16:53] 06Operations, 10Beta-Cluster-Infrastructure, 07HHVM, 13Patch-For-Review: Move the MW Beta appservers to Debian - https://phabricator.wikimedia.org/T144006#2632096 (10hashar) [09:18:29] (03PS1) 10Gilles: Removing TinyRGB config [debs/python-thumbor-wikimedia] - 10https://gerrit.wikimedia.org/r/310253 [09:19:43] (03CR) 10Filippo Giunchedi: [C: 032] Fix SLOW_PROCESSING_LIMIT [debs/python-thumbor-wikimedia] - 10https://gerrit.wikimedia.org/r/310015 (owner: 10Gilles) [09:19:54] (03CR) 10Filippo Giunchedi: [C: 032] Removing TinyRGB config [debs/python-thumbor-wikimedia] - 10https://gerrit.wikimedia.org/r/310253 (owner: 10Gilles) [09:22:43] 06Operations, 10Beta-Cluster-Infrastructure, 07HHVM, 13Patch-For-Review: Move the MW Beta appservers to Debian - https://phabricator.wikimedia.org/T144006#2632106 (10hashar) After some mess with scap mwdeploy keys solved by running on deployment-tin: sudo -u jenkins-deploy -H SSH_AUTH_SOCK=/run/keyhol... [09:25:12] 06Operations, 10Beta-Cluster-Infrastructure, 07HHVM, 13Patch-For-Review: Move the MW Beta appservers to Debian - https://phabricator.wikimedia.org/T144006#2632108 (10hashar) [09:26:52] 06Operations, 10Beta-Cluster-Infrastructure, 07HHVM, 13Patch-For-Review: Move the MW Beta appservers to Debian - https://phabricator.wikimedia.org/T144006#2586022 (10hashar) Noticed in logstash: Warning: failed to mkdir "/srv/mediawiki/php-master/images/thumb/2/20/Order_of_St_John_(UK)_ribbon.png"... [09:27:25] (03CR) 10Muehlenhoff: "Looks good, but I would rather use the ports 32768/32769 as the logical continuation of commit ef9d458d3c826 (you should be able to avoid " [puppet] - 10https://gerrit.wikimedia.org/r/310059 (owner: 10ArielGlenn) [09:33:31] (03PS3) 10ArielGlenn: set up static nfs lock manager ports for dataset hosts [puppet] - 10https://gerrit.wikimedia.org/r/310059 [09:34:28] 06Operations, 10Beta-Cluster-Infrastructure, 07HHVM, 13Patch-For-Review: Move the MW Beta appservers to Debian - https://phabricator.wikimedia.org/T144006#2632143 (10hashar) >>! In T144006#2632108, @hashar wrote: > Noticed in logstash: > > Warning: failed to mkdir "/srv/mediawiki/php-master/images/thu... [09:34:31] (03PS5) 10Filippo Giunchedi: Clean up Thumbor configuration [puppet] - 10https://gerrit.wikimedia.org/r/310024 (owner: 10Gilles) [09:34:52] (03PS1) 10Elukey: Replace mediawiki01 with mediawiki04 (Debian Jessie) in deployment-prep [puppet] - 10https://gerrit.wikimedia.org/r/310256 (https://phabricator.wikimedia.org/T144006) [09:37:00] (03CR) 10Hashar: [C: 031] "Good, once merged and puppet ran on deployment-tin and deployment-cache-text04 we can shutdown the instance deployment-mediawiki01." [puppet] - 10https://gerrit.wikimedia.org/r/310256 (https://phabricator.wikimedia.org/T144006) (owner: 10Elukey) [09:37:26] (03CR) 10Elukey: [C: 032] Replace mediawiki01 with mediawiki04 (Debian Jessie) in deployment-prep [puppet] - 10https://gerrit.wikimedia.org/r/310256 (https://phabricator.wikimedia.org/T144006) (owner: 10Elukey) [09:39:03] (03CR) 10Filippo Giunchedi: [C: 032] Clean up Thumbor configuration [puppet] - 10https://gerrit.wikimedia.org/r/310024 (owner: 10Gilles) [09:39:08] (03PS6) 10Filippo Giunchedi: Clean up Thumbor configuration [puppet] - 10https://gerrit.wikimedia.org/r/310024 (owner: 10Gilles) [09:39:13] (03PS13) 10Volans: Automation: automatically reimage host [puppet] - 10https://gerrit.wikimedia.org/r/308520 (https://phabricator.wikimedia.org/T143536) [09:39:54] !log alter localuser table in dbstore2002 - T141951 [09:39:58] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [09:49:48] (03CR) 10Muehlenhoff: [C: 031] "Looks good to me" [puppet] - 10https://gerrit.wikimedia.org/r/310059 (owner: 10ArielGlenn) [09:56:21] (03PS1) 10Filippo Giunchedi: Revert "swift: disable thumbor shadow traffic" [puppet] - 10https://gerrit.wikimedia.org/r/310259 (https://phabricator.wikimedia.org/T139606) [10:04:36] (03CR) 10Gilles: [C: 031] Revert "swift: disable thumbor shadow traffic" [puppet] - 10https://gerrit.wikimedia.org/r/310259 (https://phabricator.wikimedia.org/T139606) (owner: 10Filippo Giunchedi) [10:05:18] 06Operations, 10DBA: Drop PovWatch extension-related database tables from Wikimedia wikis - https://phabricator.wikimedia.org/T54924#2632199 (10Marostegui) a:03Marostegui [10:05:33] (03CR) 10ArielGlenn: "a dry run from labcontrol1001 produces:" [puppet] - 10https://gerrit.wikimedia.org/r/309709 (https://phabricator.wikimedia.org/T123607) (owner: 10Alex Monk) [10:09:09] (03CR) 10Filippo Giunchedi: [C: 032] Revert "swift: disable thumbor shadow traffic" [puppet] - 10https://gerrit.wikimedia.org/r/310259 (https://phabricator.wikimedia.org/T139606) (owner: 10Filippo Giunchedi) [10:09:39] 06Operations, 06Community-Tech, 10MediaWiki-CrossWikiWatchlist, 10hardware-requests, 07Crosswiki: Acquire new hardware for hosting cross-wiki watchlist database - https://phabricator.wikimedia.org/T142538#2632214 (10jcrespo) We probably will need a protoype on labs, and another in production that I can s... [10:10:26] !log enable shadow requests to thumbor for small wikis T139606 [10:10:29] (03CR) 10BBlack: [C: 031] varnish: increase lru_interval [puppet] - 10https://gerrit.wikimedia.org/r/310240 (owner: 10Ema) [10:10:31] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [10:10:46] (03PS2) 10Ema: varnish: increase lru_interval [puppet] - 10https://gerrit.wikimedia.org/r/310240 [10:10:50] ema bblack ^ JFYI, unlikely it is going to make a difference from your POV [10:10:51] (03CR) 10Ema: [C: 032 V: 032] varnish: increase lru_interval [puppet] - 10https://gerrit.wikimedia.org/r/310240 (owner: 10Ema) [10:12:32] (03PS1) 10Hashar: beta: change canary mw server from 01 to 04 [puppet] - 10https://gerrit.wikimedia.org/r/310264 (https://phabricator.wikimedia.org/T144006) [10:14:23] (03CR) 10Elukey: [C: 032] beta: change canary mw server from 01 to 04 [puppet] - 10https://gerrit.wikimedia.org/r/310264 (https://phabricator.wikimedia.org/T144006) (owner: 10Hashar) [10:31:32] (03PS1) 10Ema: varnish: bump nuke_limit [puppet] - 10https://gerrit.wikimedia.org/r/310266 (https://phabricator.wikimedia.org/T131502) [10:32:21] (03PS2) 10Muehlenhoff: Remove mw2061-mw2074 from DNS [dns] - 10https://gerrit.wikimedia.org/r/310002 (https://phabricator.wikimedia.org/T144745) [10:33:06] (03CR) 10Muehlenhoff: [C: 032] Remove mw2061-mw2074 from DNS [dns] - 10https://gerrit.wikimedia.org/r/310002 (https://phabricator.wikimedia.org/T144745) (owner: 10Muehlenhoff) [10:34:42] (03PS1) 10Filippo Giunchedi: thumbor: enable most wikis from top30, excluding commons [puppet] - 10https://gerrit.wikimedia.org/r/310268 (https://phabricator.wikimedia.org/T139606) [10:36:02] (03PS1) 10Giuseppe Lavagetto: base::puppet: allow using srv records [puppet] - 10https://gerrit.wikimedia.org/r/310269 [10:36:04] (03PS1) 10Giuseppe Lavagetto: tcpircbot: add the new puppetmaster frontends, remove palladium [puppet] - 10https://gerrit.wikimedia.org/r/310270 [10:36:06] (03PS1) 10Giuseppe Lavagetto: varnish: point pybal_config to all puppetmaster frontends [puppet] - 10https://gerrit.wikimedia.org/r/310271 [10:36:08] (03PS1) 10Giuseppe Lavagetto: puppetmaster: Allow new puppetmasters to work with wmf-reimage [puppet] - 10https://gerrit.wikimedia.org/r/310272 [10:36:10] 06Operations, 13Patch-For-Review: Remove mw2061-mw2074 - https://phabricator.wikimedia.org/T144745#2632292 (10MoritzMuehlenhoff) a:05MoritzMuehlenhoff>03Papaul mw2061-mw2074 have been removed from puppet, Icinga, conftool and Salt and were powered down. They have also been removed from DNS (except the mgmt... [10:36:10] (03PS1) 10Giuseppe Lavagetto: puppet: use srv records to find the puppetmaster [puppet] - 10https://gerrit.wikimedia.org/r/310273 [10:36:12] (03PS1) 10Giuseppe Lavagetto: puppetmaster: switch ca_server to be puppetmaster1001 [puppet] - 10https://gerrit.wikimedia.org/r/310274 [10:36:23] 06Operations, 13Patch-For-Review: Decomission mw2061-mw2074 - https://phabricator.wikimedia.org/T144745#2632294 (10MoritzMuehlenhoff) [10:36:25] <_joe_> akosiaris: here you are, sir [10:36:29] <_joe_> ^^ :) [10:38:37] (03CR) 10Muehlenhoff: [C: 031] "Now acked in Ops meeting" [puppet] - 10https://gerrit.wikimedia.org/r/308544 (https://phabricator.wikimedia.org/T144726) (owner: 10Elukey) [10:39:23] (03PS2) 10Filippo Giunchedi: thumbor: enable most wikis from top30, excluding commons [puppet] - 10https://gerrit.wikimedia.org/r/310268 (https://phabricator.wikimedia.org/T139606) [10:40:10] 06Operations, 10Continuous-Integration-Infrastructure (phase-out-gallium): Upgrade Zuul on scandium.eqiad.wmnet (Jessie zuul-merger) - https://phabricator.wikimedia.org/T145057#2632303 (10hashar) [10:40:24] (03PS2) 10Giuseppe Lavagetto: service::node: restrict readability of configurations. [puppet] - 10https://gerrit.wikimedia.org/r/309522 [10:43:39] (03PS3) 10Giuseppe Lavagetto: JobRunners: Remove the RESTBase runners [puppet] - 10https://gerrit.wikimedia.org/r/309298 (https://phabricator.wikimedia.org/T144843) (owner: 10Mobrovac) [10:44:41] (03CR) 10Giuseppe Lavagetto: [C: 032 V: 032] JobRunners: Remove the RESTBase runners [puppet] - 10https://gerrit.wikimedia.org/r/309298 (https://phabricator.wikimedia.org/T144843) (owner: 10Mobrovac) [10:45:07] !log uploaded 2.5.0-8-gcbc7f62-wmf2jessie1 to jessie-wikimedia/thirdparty (T145057) [10:45:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [10:47:11] (03CR) 10Filippo Giunchedi: [C: 032] thumbor: enable most wikis from top30, excluding commons [puppet] - 10https://gerrit.wikimedia.org/r/310268 (https://phabricator.wikimedia.org/T139606) (owner: 10Filippo Giunchedi) [10:47:20] (03PS3) 10Filippo Giunchedi: thumbor: enable most wikis from top30, excluding commons [puppet] - 10https://gerrit.wikimedia.org/r/310268 (https://phabricator.wikimedia.org/T139606) [10:48:25] !log zuul upgraded to zuul_2.5.0-8-gcbc7f62-wmf2jessie1 on scandium (T145057) [10:48:30] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [10:51:43] (03PS2) 10Giuseppe Lavagetto: Citoid: Use Scap3 for config deploys [puppet] - 10https://gerrit.wikimedia.org/r/310021 (https://phabricator.wikimedia.org/T144597) (owner: 10Mobrovac) [10:51:45] 06Operations, 10Continuous-Integration-Infrastructure (phase-out-gallium), 13Patch-For-Review: Upgrade Zuul on scandium.eqiad.wmnet (Jessie zuul-merger) - https://phabricator.wikimedia.org/T145057#2632323 (10elukey) 05Open>03Resolved a:03elukey Package installed and uploaded to jessie-wikimedia/thirdp... [10:52:44] (03CR) 10Giuseppe Lavagetto: [C: 032 V: 032] Citoid: Use Scap3 for config deploys [puppet] - 10https://gerrit.wikimedia.org/r/310021 (https://phabricator.wikimedia.org/T144597) (owner: 10Mobrovac) [10:54:30] (03CR) 10Muehlenhoff: [C: 031] "Thanks for working on this. Let's merge this to give it some exposure on real world reimagings :-)" [puppet] - 10https://gerrit.wikimedia.org/r/308520 (https://phabricator.wikimedia.org/T143536) (owner: 10Volans) [10:58:21] !log finished rolling restart swift-proxy for thumbor change T139606 [10:58:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [10:58:33] !log citoid deployed e79430f for T144597 [10:58:38] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [10:59:56] stashbot's not here? [11:01:27] (03PS14) 10Volans: Automation: automatically reimage host [puppet] - 10https://gerrit.wikimedia.org/r/308520 (https://phabricator.wikimedia.org/T143536) [11:02:05] (03PS4) 10Elukey: Add the druid-admins group for the Analytics team [puppet] - 10https://gerrit.wikimedia.org/r/308544 (https://phabricator.wikimedia.org/T144726) [11:02:42] (03CR) 10Volans: [C: 032] Automation: automatically reimage host [puppet] - 10https://gerrit.wikimedia.org/r/308520 (https://phabricator.wikimedia.org/T143536) (owner: 10Volans) [11:03:19] moritzm: I rebased/solved-conflicts of https://gerrit.wikimedia.org/r/308544, if you are ok I'd merge it [11:03:47] sorry elukey you need to rebase again ;) [11:03:58] elukey: sure, I +1d it an hour ago or so [11:04:09] it was only -1d since it did not have ops meeting approval [11:04:18] volans: :D [11:04:30] (03PS5) 10Elukey: Add the druid-admins group for the Analytics team [puppet] - 10https://gerrit.wikimedia.org/r/308544 (https://phabricator.wikimedia.org/T144726) [11:05:10] volans: for that commit I will not be angry at you since it is a great job [11:05:37] now the punishment is to reimage 10 mw eqiad hosts :P [11:06:06] I discovered a bug in puppet on my merged stuff, sending a fix soon [11:06:13] (03CR) 10Elukey: [C: 032] Add the druid-admins group for the Analytics team [puppet] - 10https://gerrit.wikimedia.org/r/308544 (https://phabricator.wikimedia.org/T144726) (owner: 10Elukey) [11:07:56] 06Operations, 06Performance-Team, 10Thumbor: AttributeError: 'Engine' object has no attribute 'exif' - https://phabricator.wikimedia.org/T145504#2632355 (10Gilles) [11:12:00] PROBLEM - puppet last run on neodymium is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 5 seconds ago with 1 failures. Failed resources (up to 3 shown): File[/usr/local/sbin/wmf-auto-reimage] [11:12:14] 06Operations, 06Performance-Team, 10Thumbor: 'NoneType' object has no attribute 'lstrip' - https://phabricator.wikimedia.org/T145505#2632375 (10Gilles) [11:12:24] 06Operations, 06Performance-Team, 10Thumbor: 'NoneType' object has no attribute 'lstrip' - https://phabricator.wikimedia.org/T145505#2632390 (10Gilles) [11:14:07] (03PS1) 10Volans: Salt: fix source path typo [puppet] - 10https://gerrit.wikimedia.org/r/310278 (https://phabricator.wikimedia.org/T143536) [11:14:11] it's me, fixing already *** [11:15:26] (03CR) 10Volans: [C: 032] Salt: fix source path typo [puppet] - 10https://gerrit.wikimedia.org/r/310278 (https://phabricator.wikimedia.org/T143536) (owner: 10Volans) [11:15:34] 06Operations, 10ChangeProp, 10DBA, 10MediaWiki-API, and 4 others: Investigate slow transcludedin query - https://phabricator.wikimedia.org/T145079#2632395 (10jcrespo) a:03jcrespo [11:15:46] (03PS1) 10Elukey: Add journalctl capabilities to the druid-admins group [puppet] - 10https://gerrit.wikimedia.org/r/310279 (https://phabricator.wikimedia.org/T144726) [11:17:11] RECOVERY - puppet last run on neodymium is OK: OK: Puppet is currently enabled, last run 2 seconds ago with 0 failures [11:18:38] !log reimaging mw2198 and mw2199 to test the automation script T143536 [11:18:44] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [11:18:51] volans, see puppet error on labscontrol [11:19:08] jynus: ack [11:20:02] thanks [11:20:06] 06Operations, 06Performance-Team, 10Thumbor: 'NoneType' object has no attribute 'lstrip' - https://phabricator.wikimedia.org/T145505#2632417 (10Gilles) Another example: ``` Sep 13 11:18:32 thumbor1001 thumbor@8828[71306]: thumbor:ERROR [BaseHandler] get_image failed for url `http%3A//ms-fe.svc.eqiad.wmnet/... [11:21:17] (03PS1) 10Filippo Giunchedi: thumbor: add prometheus::node_exporter [puppet] - 10https://gerrit.wikimedia.org/r/310281 (https://phabricator.wikimedia.org/T139606) [11:21:19] (03PS1) 10Filippo Giunchedi: swift: enable thumbor on commons [puppet] - 10https://gerrit.wikimedia.org/r/310282 (https://phabricator.wikimedia.org/T139606) [11:21:25] jynus: fixed thanks [11:25:20] (03CR) 10Filippo Giunchedi: [C: 032] thumbor: add prometheus::node_exporter [puppet] - 10https://gerrit.wikimedia.org/r/310281 (https://phabricator.wikimedia.org/T139606) (owner: 10Filippo Giunchedi) [11:25:30] (03PS2) 10Filippo Giunchedi: thumbor: add prometheus::node_exporter [puppet] - 10https://gerrit.wikimedia.org/r/310281 (https://phabricator.wikimedia.org/T139606) [11:25:52] (03CR) 10Giuseppe Lavagetto: [C: 04-2] "You should make the output of your service go to a specified syslog file, via an rsyslog rule, and make that file/dir readable by druid-ad" [puppet] - 10https://gerrit.wikimedia.org/r/310279 (https://phabricator.wikimedia.org/T144726) (owner: 10Elukey) [11:26:42] _joe_ ah ok didn't know it! thanks --^ [11:27:57] (03Abandoned) 10Elukey: Add journalctl capabilities to the druid-admins group [puppet] - 10https://gerrit.wikimedia.org/r/310279 (https://phabricator.wikimedia.org/T144726) (owner: 10Elukey) [11:28:23] (03CR) 10KartikMistry: "recheck" [debs/contenttranslation/apertium-kaz] - 10https://gerrit.wikimedia.org/r/296366 (https://phabricator.wikimedia.org/T107306) (owner: 10KartikMistry) [11:31:33] (03PS1) 10Volans: Salt: include password module [puppet] - 10https://gerrit.wikimedia.org/r/310283 (https://phabricator.wikimedia.org/T143536) [11:33:35] (03CR) 10Volans: [C: 032] "Fix confirmed:" [puppet] - 10https://gerrit.wikimedia.org/r/310283 (https://phabricator.wikimedia.org/T143536) (owner: 10Volans) [11:33:53] (03PS2) 10Volans: Salt: include password module [puppet] - 10https://gerrit.wikimedia.org/r/310283 (https://phabricator.wikimedia.org/T143536) [11:34:04] (03CR) 10Volans: [V: 032] Salt: include password module [puppet] - 10https://gerrit.wikimedia.org/r/310283 (https://phabricator.wikimedia.org/T143536) (owner: 10Volans) [11:36:21] 06Operations, 06Operations-Software-Development, 07HHVM, 13Patch-For-Review: Upgrade all mw* servers to debian jessie - https://phabricator.wikimedia.org/T143536#2571031 (10ops-monitoring-bot) Script wmf_auto_reimage was launched by volans on neodymium.eqiad.wmnet for hosts: ``` ['mw2198.codfw.wmnet', 'mw2... [11:36:56] moritzm, elukey: https://phabricator.wikimedia.org/T143536#2632462 ;) [11:37:40] ok :-) I'll give it a shot myself now [11:41:54] <_joe_> volans: is palladium in the config of your script somewhere? [11:42:22] _joe_: no I resolve the CNAME of puppet.wikimedia.org [11:42:33] <_joe_> which is wrong :) [11:42:41] meaning? [11:42:45] <_joe_> I mean we're not going to use it very soon [11:42:57] <_joe_> we're gonna use the SRV record [11:43:04] <_joe_> which I added some time agon [11:43:07] <_joe_> *ago [11:44:00] <_joe_> _x-puppet._tcp.$::site.wmnet [11:44:10] <_joe_> I know you're now loving me :D [11:44:27] (03PS1) 10Jdrewniak: Bumping portals to master - adding search language to event logging [mediawiki-config] - 10https://gerrit.wikimedia.org/r/310287 [11:45:04] (03CR) 10KartikMistry: "recheck" [debs/contenttranslation/apertium-kaz] - 10https://gerrit.wikimedia.org/r/296366 (https://phabricator.wikimedia.org/T107306) (owner: 10KartikMistry) [11:45:16] lol [11:47:26] ok I'll update it [11:48:02] will they be equivalent (eqiad and codfw ones)? [11:49:35] !log restarting elasticseaarch on relforge1001 - OOM heap space [11:49:40] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [11:52:08] 06Operations, 06Operations-Software-Development, 07HHVM, 13Patch-For-Review: Upgrade all mw* servers to debian jessie - https://phabricator.wikimedia.org/T143536#2632502 (10ops-monitoring-bot) Script wmf_auto_reimage was launched by jmm on neodymium.eqiad.wmnet for hosts: ``` ['mw2100.codfw.wmnet'] ``` The... [11:55:42] niceeeee --^ [12:06:00] !log zotero translators deploying cad95af [12:06:06] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [12:17:01] (03CR) 10KartikMistry: "recheck" [debs/contenttranslation/apertium-kaz] - 10https://gerrit.wikimedia.org/r/296366 (https://phabricator.wikimedia.org/T107306) (owner: 10KartikMistry) [12:21:04] 06Operations, 10ContentTranslation-CXserver, 10ContentTranslation-Deployments, 10MediaWiki-extensions-ContentTranslation, and 5 others: Package apertium (and dependencies) for Jessie - https://phabricator.wikimedia.org/T107306#2632610 (10Amire80) [12:24:09] !log putting db1024 under maintenance (potential lag, etc.) to test solutions for T145079 [12:24:15] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [12:29:42] aude: hashar Dereckson I can do EU swat today :) [12:29:57] fyi [12:34:22] PROBLEM - Apache HTTP on mw2100 is CRITICAL: Connection timed out [12:35:34] PROBLEM - puppet last run on mw2100 is CRITICAL: Timeout while attempting connection [12:35:52] PROBLEM - salt-minion processes on mw2100 is CRITICAL: Timeout while attempting connection [12:36:56] mw2100 has been reimaged [12:37:13] PROBLEM - Check size of conntrack table on mw2100 is CRITICAL: Timeout while attempting connection [12:37:19] PROBLEM - DPKG on mw2100 is CRITICAL: Timeout while attempting connection [12:37:28] silenced [12:38:12] did the same with mw219[89] [12:39:04] RECOVERY - Apache HTTP on mw2100 is OK: HTTP OK: HTTP/1.1 200 OK - 10975 bytes in 0.073 second response time [12:39:22] RECOVERY - Check size of conntrack table on mw2100 is OK: OK: nf_conntrack is 0 % full [12:39:33] RECOVERY - DPKG on mw2100 is OK: All packages OK [12:40:33] RECOVERY - salt-minion processes on mw2100 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion [12:40:36] (03CR) 10Filippo Giunchedi: "this is almost already the case in production, exception being bast3001" [puppet] - 10https://gerrit.wikimedia.org/r/309995 (owner: 10Filippo Giunchedi) [12:44:15] (03CR) 10Alexandros Kosiaris: [C: 031] "fine by me. Should we reimage bast3001 to conform to that ?" [puppet] - 10https://gerrit.wikimedia.org/r/309995 (owner: 10Filippo Giunchedi) [12:46:14] 06Operations, 06Operations-Software-Development, 07HHVM, 13Patch-For-Review: Upgrade all mw* servers to debian jessie - https://phabricator.wikimedia.org/T143536#2632712 (10ops-monitoring-bot) Completed auto-reimage of hosts: ``` ['mw2198.codfw.wmnet', 'mw2199.codfw.wmnet'] ``` Those hosts were successful:... [12:47:01] !log alter localuser table in db2040 - T141951 [12:47:06] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [12:50:40] (03CR) 10KartikMistry: "recheck" [debs/contenttranslation/apertium-kaz] - 10https://gerrit.wikimedia.org/r/296366 (https://phabricator.wikimedia.org/T107306) (owner: 10KartikMistry) [12:55:36] (03CR) 10KartikMistry: "recheck" [debs/contenttranslation/apertium-kaz] - 10https://gerrit.wikimedia.org/r/296366 (https://phabricator.wikimedia.org/T107306) (owner: 10KartikMistry) [12:58:03] (03PS3) 10KartikMistry: apertium-kaz: New upstream release and rebuild for Jessie [debs/contenttranslation/apertium-kaz] - 10https://gerrit.wikimedia.org/r/296366 (https://phabricator.wikimedia.org/T107306) [12:58:23] (03CR) 10jenkins-bot: [V: 04-1] apertium-kaz: New upstream release and rebuild for Jessie [debs/contenttranslation/apertium-kaz] - 10https://gerrit.wikimedia.org/r/296366 (https://phabricator.wikimedia.org/T107306) (owner: 10KartikMistry) [12:58:36] hashar: what's the plan to swat today? [12:58:45] 06Operations, 06Operations-Software-Development, 07HHVM, 13Patch-For-Review: Upgrade all mw* servers to debian jessie - https://phabricator.wikimedia.org/T143536#2632739 (10ops-monitoring-bot) Completed auto-reimage of hosts: ``` ['mw2100.codfw.wmnet'] ``` Those hosts were successful: ``` ['mw2100.codfw.wm... [13:00:04] hashar, Dereckson, addshore, and aude: Dear anthropoid, the time has come. Please deploy European Mid-day SWAT(Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20160913T1300). [13:00:04] Addshore, hashar, Urbanecm, MatmaRex, kart_, and jan_drewniak: A patch you scheduled for European Mid-day SWAT(Max 8 patches) is about to be deployed. Please be available during the process. [13:00:10] o/ [13:00:11] Present [13:00:14] o/ [13:00:42] o/ [13:00:44] RECOVERY - puppet last run on mw2100 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [13:00:47] you are prolific :] [13:01:10] * kart_ here [13:01:12] I'll start then with yours hashar ? [13:01:15] addshore: be bold and do https://gerrit.wikimedia.org/r/#/c/305653/4 ! :D [13:01:20] that's a lot of patches suddenly. [13:01:27] will do mine last if there are enough time. it is not important [13:01:32] ahh okay! [13:01:46] (03PS5) 10Hashar: Enable RevisionSlider BetaFeature on all wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/305653 (https://phabricator.wikimedia.org/T143421) (owner: 10Addshore) [13:01:48] My patch can be rescheduled if needed too. [13:01:51] (03CR) 10Hashar: [C: 032] "SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/305653 (https://phabricator.wikimedia.org/T143421) (owner: 10Addshore) [13:01:54] addshore: it is all your [13:02:01] hashar: ack [13:02:02] I am going to rebase all the other patches on top of adam one [13:02:24] (03Merged) 10jenkins-bot: Enable RevisionSlider BetaFeature on all wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/305653 (https://phabricator.wikimedia.org/T143421) (owner: 10Addshore) [13:03:09] my patch is in debug logging for a bug i can't reproduce, so i can't test it, but it's also perfectly safe to deploy. [13:03:22] syncing [13:03:47] (03PS3) 10Hashar: Deploy Compact Language Links out of beta for Tulu Wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/305997 (owner: 10KartikMistry) [13:03:48] mass rebase [13:03:49] (03PS3) 10Hashar: Stop logging xff from 127.0.0.1 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/301339 (https://phabricator.wikimedia.org/T129982) [13:03:51] (03PS4) 10Hashar: Fix illegal wgFlaggedRevsWhitelist for arwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/309514 (https://phabricator.wikimedia.org/T144673) (owner: 10Urbanecm) [13:03:53] (03PS2) 10Hashar: Bumping portals to master - adding search language to event logging [mediawiki-config] - 10https://gerrit.wikimedia.org/r/310287 (owner: 10Jdrewniak) [13:04:09] !log addshore@tin Synchronized wmf-config/InitialiseSettings.php: SWAT: [[gerrit:305653|Enable RevisionSlider BetaFeature on all wikis]] (duration: 00m 49s) [13:04:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [13:04:34] MatmaRex: going to do your UploadWizard patch and just sync it to the whole cluster [13:04:43] then do the config changes one by one [13:04:45] that one all looks good, hashar +2 the next one and I will sync? [13:04:54] hashar: oh, thanks [13:04:56] ok* [13:05:08] (03CR) 10Alexandros Kosiaris: [C: 031] "looks ok, minor comment inline" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/310269 (owner: 10Giuseppe Lavagetto) [13:05:18] (03CR) 10Hashar: [C: 032] "SWAT ping addshore :D" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/305997 (owner: 10KartikMistry) [13:05:33] (03CR) 10Alexandros Kosiaris: [C: 031] tcpircbot: add the new puppetmaster frontends, remove palladium [puppet] - 10https://gerrit.wikimedia.org/r/310270 (owner: 10Giuseppe Lavagetto) [13:05:46] ah no wrong order [13:05:50] xD [13:05:57] (03CR) 10Hashar: Deploy Compact Language Links out of beta for Tulu Wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/305997 (owner: 10KartikMistry) [13:06:16] I am trying to optimize only to confuse myself [13:06:26] thats why I let you do the +2 ;) [13:06:32] (03CR) 10Alexandros Kosiaris: [C: 031] varnish: point pybal_config to all puppetmaster frontends [puppet] - 10https://gerrit.wikimedia.org/r/310271 (owner: 10Giuseppe Lavagetto) [13:06:45] (03CR) 10Hashar: [C: 032] "That is annoying!! SWAT" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/309514 (https://phabricator.wikimedia.org/T144673) (owner: 10Urbanecm) [13:07:07] hashar: annoying SWAT? ;) [13:07:08] urandom: looks like flaggedrevs / that issue never got it [13:07:11] (03Merged) 10jenkins-bot: Fix illegal wgFlaggedRevsWhitelist for arwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/309514 (https://phabricator.wikimedia.org/T144673) (owner: 10Urbanecm) [13:07:14] got hit [13:07:29] urandom: oh no sorry. The annoyance is the bad unicode in the file :]] [13:07:36] its on mw1099 [13:07:47] I dont think we can test it [13:07:55] ack, syncing [13:08:00] (03CR) 10Alexandros Kosiaris: [C: 04-1] "changed to -1" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/310271 (owner: 10Giuseppe Lavagetto) [13:08:01] If the message prefixed by "urandom" should be prefixed "Urbanecm", I don't know how to test it too... [13:08:06] kart_: is your patch testable on mw1099 ( https://gerrit.wikimedia.org/r/#/c/305997/3 ) ? [13:08:09] I can only test if the wiki will work. [13:08:18] (03CR) 10Hashar: [C: 032] "SWAT for kart_" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/305997 (owner: 10KartikMistry) [13:08:38] !log addshore@tin Synchronized wmf-config/flaggedrevs.php: SWAT: [[gerrit:309514|Fix illegal wgFlaggedRevsWhitelist for arwiki]] (duration: 00m 47s) [13:08:42] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [13:08:44] (03Merged) 10jenkins-bot: Deploy Compact Language Links out of beta for Tulu Wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/305997 (owner: 10KartikMistry) [13:08:47] well it is synced Urbanecm :) [13:08:49] hashar: testing [13:08:49] (03CR) 10Alexandros Kosiaris: [C: 031] "ok, but let's test this right after merging" [puppet] - 10https://gerrit.wikimedia.org/r/310272 (owner: 10Giuseppe Lavagetto) [13:08:51] Thanks. [13:08:55] Going to close the task. [13:08:59] thank you ! [13:09:04] My pleasure : [13:09:05] ) [13:09:22] Deploy Compact Language Links out of beta for Tulu Wikipedia [mediawiki-config] - https://gerrit.wikimedia.org/r/305997 (owner: KartikMistry) is on mw1099 [13:09:24] (03CR) 10Alexandros Kosiaris: [C: 04-1] "hmm let's test this on a few hosts first ?" [puppet] - 10https://gerrit.wikimedia.org/r/310273 (owner: 10Giuseppe Lavagetto) [13:09:38] hashar: looks good [13:09:39] (03CR) 10Alexandros Kosiaris: [C: 031] puppetmaster: switch ca_server to be puppetmaster1001 [puppet] - 10https://gerrit.wikimedia.org/r/310274 (owner: 10Giuseppe Lavagetto) [13:09:41] addshore: ^^ [13:09:45] syncing [13:09:47] addshore: looks good. go ahead. [13:10:13] (03CR) 10Hashar: "SWAT for jan_drewniak" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/310287 (owner: 10Jdrewniak) [13:10:16] (03CR) 10Hashar: [C: 032] Bumping portals to master - adding search language to event logging [mediawiki-config] - 10https://gerrit.wikimedia.org/r/310287 (owner: 10Jdrewniak) [13:10:20] so portal [13:10:34] !log addshore@tin Synchronized dblists/clldefault.dblist: SWAT: [[gerrit:305997|Deploy Compact Language Links out of beta for Tulu Wikipedia]] (duration: 00m 46s) [13:10:39] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [13:10:41] (03Merged) 10jenkins-bot: Bumping portals to master - adding search language to event logging [mediawiki-config] - 10https://gerrit.wikimedia.org/r/310287 (owner: 10Jdrewniak) [13:11:15] jan_drewniak: is this testable on mw1099? [13:11:25] is that all about git submodule update portals [13:11:29] then scap sync-dir portals ? [13:12:31] looks like it is mostly localization updates [13:12:55] hashar: yeah, it's testable, and it looks good to me [13:13:26] that is for https://www.wikipedia.org/ isn't it ? [13:13:39] thanks addshore and hashar [13:14:11] so no l10n stuff is needed [13:14:30] yeah it is purely json [13:14:35] afaik [13:15:01] hashar: yeah localization on www.wikipedia.org is client-side, just json [13:15:09] syncing [13:15:09] okay *goes to sync* [13:15:12] ahh :) [13:15:15] cause at least it looks good on my desktop [13:15:26] https://gerrit.wikimedia.org/r/#/c/301339/ :D [13:15:35] that one is for the jobrunners mainly [13:15:57] !log hashar@tin Synchronized portals: Bumping portals to master (duration: 00m 50s) [13:16:03] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [13:16:03] hashar: just you and https://gerrit.wikimedia.org/r/#/c/310180/ left? [13:16:28] yeah [13:16:29] I'll go ahead and do MatmaRex's patch now! [13:16:32] and that cant be tested on mw1099 [13:16:37] gotta pull it on the jobrunners [13:16:46] watch how that behave in fluorine [13:17:17] your one cant be tested on mw1099, can MatmaRex's? [13:17:23] * hashar hunts for list of jobrunners servers [13:17:37] uploadWizard I dont think it can [13:18:20] addshore: nope [13:18:26] syncing [13:18:30] \o/ [13:18:31] i mean, i can verify that it doesn't break anything. which it doesn't. [13:18:41] yeah it will be fine [13:18:43] but i can't test the fix, since i can't reproduce the conditions [13:18:46] :) [13:19:13] !log addshore@tin Synchronized php-1.28.0-wmf.18/extensions/UploadWizard/resources/uw.EventFlowLogger.js: SWAT: [[gerrit:310180|uw.EventFlowLogger: Fix NS_ERROR_NOT_AVAILABLE debug logging (duration: 00m 49s) [13:19:18] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [13:19:18] hashar: just you left then! :) [13:20:04] mind +2 ing it ? [13:20:08] ack [13:20:08] bd808 reviewed it [13:20:11] I will then tail -F xff.log|grep ', 127.0.0.1'|cut -d\ -f4- [13:20:18] and scap pull some jobrunners [13:20:19] (03CR) 10Addshore: [C: 032] Stop logging xff from 127.0.0.1 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/301339 (https://phabricator.wikimedia.org/T129982) (owner: 10Hashar) [13:20:48] (03Merged) 10jenkins-bot: Stop logging xff from 127.0.0.1 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/301339 (https://phabricator.wikimedia.org/T129982) (owner: 10Hashar) [13:21:09] will do the mw13 something [13:21:57] !log Pulling "Stop logging xff from 127.0.0.1" patch on mw1300-1303 T129982 [13:22:04] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [13:22:35] it is like magic [13:22:58] much quieter? [13:23:03] yeah [13:23:06] [= [13:23:40] tail -F xff.log |grep mw130 [13:23:57] spam is gone [13:24:37] nice [13:24:45] will watch the job queue related board [13:24:52] https://grafana.wikimedia.org/dashboard/db/job-queue-health https://grafana.wikimedia.org/dashboard/db/job-queue-rate [13:24:53] meanwhile [13:24:58] it is quite impressive to pair the SWAT [13:25:02] that is MUCH FASTER [13:25:44] yeh, 30 mins and 6 patches done, although there wasn't much waiting around for testing [13:26:14] and, of course, it's always good to know 2 people are looking at everything... [13:28:03] !log Pulling "Stop logging xff from 127.0.0.1" patch on mw1299 and mw1161-mw1169 T129982 [13:28:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [13:28:29] just curious hashar, but for those pulls, your just goting to each box and doing "scap pull" right? [13:28:44] for i in 1 2 3 4 5 6 7 8 9; do ssh mw116$i.eqiad.wmnet scap pull; done; [13:28:57] ((one could use "seq" but I keep screwing it up) [13:29:15] there might be a way to do that with scap or dsh [13:29:16] yup [13:29:24] but really, it is much simpler to just ssh from my box via the bastion [13:29:30] and figure out the proper command on my local box [13:29:51] hmm spam gone entirely [13:29:55] fatalmonitor is happy [13:29:59] jobs are still processed [13:30:17] xff.log is done to amanageable level ? [13:30:17] great! [13:30:39] well, its still pretty full, but much better... [13:30:44] though there is still an insane rate of logs for the api call :] [13:31:22] wtp1004.eqiad.wmnet [13:31:23] bah [13:31:28] wonder how helpful it is [13:31:38] that is all internal traffic as well [13:33:54] !log hashar@tin Synchronized wmf-config/CommonSettings.php: Stop logging xff from 127.0.0.1 T129982 (duration: 00m 47s) [13:33:59] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [13:34:03] 06Operations, 10Wikimedia-Site-requests, 07Wikimedia-log-errors: Requests to localhost spam the 'localhost' and 'xff' log buckets - https://phabricator.wikimedia.org/T129982#2632797 (10hashar) 05Open>03Resolved The spam traffic from the job runner is gone. There is still a lot of spam from internal box,... [13:35:34] we had a variable to discard those xff [13:35:42] some list of whitelisted ip [13:37:30] mobrovac: any idea why Parsoid hits the MediaWiki API so much? [13:38:02] (03CR) 10KartikMistry: "recheck" [debs/contenttranslation/apertium-kaz] - 10https://gerrit.wikimedia.org/r/296366 (https://phabricator.wikimedia.org/T107306) (owner: 10KartikMistry) [13:38:09] (03CR) 10Ema: [C: 032] varnish: bump nuke_limit [puppet] - 10https://gerrit.wikimedia.org/r/310266 (https://phabricator.wikimedia.org/T131502) (owner: 10Ema) [13:38:15] (03PS2) 10Ema: varnish: bump nuke_limit [puppet] - 10https://gerrit.wikimedia.org/r/310266 (https://phabricator.wikimedia.org/T131502) [13:38:19] (03CR) 10Ema: [V: 032] varnish: bump nuke_limit [puppet] - 10https://gerrit.wikimedia.org/r/310266 (https://phabricator.wikimedia.org/T131502) (owner: 10Ema) [13:38:38] addshore: thank you :] [13:38:42] [= thank you! [13:39:41] AHHHH [13:39:42] I remember [13:39:50] wmf-config/squid.php (mis named) [13:40:00] and the setting $wgSquidServersNoPurge (equally misnamed) [13:45:32] (03PS2) 10Filippo Giunchedi: site: install prometheus server in esams and ulsfo [puppet] - 10https://gerrit.wikimedia.org/r/309996 (https://phabricator.wikimedia.org/T126785) [13:45:34] (03PS2) 10Filippo Giunchedi: install_server: use separate /srv for bastions [puppet] - 10https://gerrit.wikimedia.org/r/309995 [13:48:11] (03CR) 10Giuseppe Lavagetto: base::puppet: allow using srv records (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/310269 (owner: 10Giuseppe Lavagetto) [13:48:17] (03PS2) 10Giuseppe Lavagetto: base::puppet: allow using srv records [puppet] - 10https://gerrit.wikimedia.org/r/310269 [13:48:34] ah [13:51:04] (03CR) 10Filippo Giunchedi: "@akosiaris yeah I think that makes sense while we're at it." [puppet] - 10https://gerrit.wikimedia.org/r/309995 (owner: 10Filippo Giunchedi) [13:51:12] (03PS3) 10Giuseppe Lavagetto: service::node: restrict readability of configurations. [puppet] - 10https://gerrit.wikimedia.org/r/309522 [13:52:00] (03CR) 10Alexandros Kosiaris: [C: 032] apertium-kaz: New upstream release and rebuild for Jessie [debs/contenttranslation/apertium-kaz] - 10https://gerrit.wikimedia.org/r/296366 (https://phabricator.wikimedia.org/T107306) (owner: 10KartikMistry) [13:53:37] hashar: increased load from transclusions is the most likely explanation [13:54:01] (03CR) 10Giuseppe Lavagetto: [C: 032] base::puppet: allow using srv records [puppet] - 10https://gerrit.wikimedia.org/r/310269 (owner: 10Giuseppe Lavagetto) [13:54:03] 06Operations, 10DBA: Drop PovWatch extension-related database tables from Wikimedia wikis - https://phabricator.wikimedia.org/T54924#2632821 (10Marostegui) enwiki -> s1 testwiki ->s3 commonswiki -> s4 non pooled slaves to do some testing before dropping the table for good s1 -> db1073 s3 -> db1044 s4 -> db10... [13:54:04] (03PS1) 10Alexandros Kosiaris: puppet-merge: Allow forcing, passing SHA1 args [puppet] - 10https://gerrit.wikimedia.org/r/310300 [13:54:06] (03PS1) 10Alexandros Kosiaris: puppetmaster: post-merge command is actually irrelevant [puppet] - 10https://gerrit.wikimedia.org/r/310301 [13:54:08] (03PS1) 10Alexandros Kosiaris: puppetmaster: Make puppet-merge a template [puppet] - 10https://gerrit.wikimedia.org/r/310302 [13:54:10] (03PS1) 10Alexandros Kosiaris: puppetmaster: Delete the post-merge hooks [puppet] - 10https://gerrit.wikimedia.org/r/310303 [13:54:12] (03PS1) 10Alexandros Kosiaris: puppetmaster: Change the backend forced ssh command [puppet] - 10https://gerrit.wikimedia.org/r/310304 [13:54:25] <_joe_> akosiaris: ahah I'll repay your favour [13:54:29] (03CR) 10jenkins-bot: [V: 04-1] puppet-merge: Allow forcing, passing SHA1 args [puppet] - 10https://gerrit.wikimedia.org/r/310300 (owner: 10Alexandros Kosiaris) [13:54:41] hashar: https://grafana.wikimedia.org/dashboard/db/restbase?panelId=13&fullscreen [13:54:44] jenkins already is [13:54:48] grrrr [13:55:06] (03CR) 10jenkins-bot: [V: 04-1] puppetmaster: post-merge command is actually irrelevant [puppet] - 10https://gerrit.wikimedia.org/r/310301 (owner: 10Alexandros Kosiaris) [13:55:09] mobrovac: I was really just wondering [13:55:11] grrr [13:55:15] akosiaris: :(((( [13:55:24] This change or one of its cross-repo dependencies was unable to be automatically merged with the current state of its repository. Please rebase the change and upload a new patchset. [13:55:29] sigh, I 'll rebase [13:55:38] merge conflict in https://gerrit.wikimedia.org/r/#/c/310300/1 apparently [13:56:03] ERROR: content conflict in modules/admin/data/data.yaml [13:56:03] (03CR) 10jenkins-bot: [V: 04-1] puppetmaster: Make puppet-merge a template [puppet] - 10https://gerrit.wikimedia.org/r/310302 (owner: 10Alexandros Kosiaris) [13:56:04] or jenkins... [13:56:14] (03PS2) 10Alexandros Kosiaris: puppetmaster: Change the backend forced ssh command [puppet] - 10https://gerrit.wikimedia.org/r/310304 [13:56:16] (03PS2) 10Alexandros Kosiaris: puppetmaster: Make puppet-merge a template [puppet] - 10https://gerrit.wikimedia.org/r/310302 [13:56:17] the git rebase worked fine though [13:56:18] (03PS2) 10Alexandros Kosiaris: puppetmaster: Delete the post-merge hooks [puppet] - 10https://gerrit.wikimedia.org/r/310303 [13:56:20] (03PS2) 10Alexandros Kosiaris: puppet-merge: Allow forcing, passing SHA1 args [puppet] - 10https://gerrit.wikimedia.org/r/310300 [13:56:21] let's see now [13:56:22] (03PS2) 10Alexandros Kosiaris: puppetmaster: post-merge command is actually irrelevant [puppet] - 10https://gerrit.wikimedia.org/r/310301 [13:57:37] (03PS1) 10Giuseppe Lavagetto: base::puppet: remove whitespace [puppet] - 10https://gerrit.wikimedia.org/r/310305 [13:57:55] (03CR) 10Giuseppe Lavagetto: [C: 032 V: 032] base::puppet: remove whitespace [puppet] - 10https://gerrit.wikimedia.org/r/310305 (owner: 10Giuseppe Lavagetto) [14:00:24] (03CR) 10Giuseppe Lavagetto: varnish: point pybal_config to all puppetmaster frontends (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/310271 (owner: 10Giuseppe Lavagetto) [14:02:11] (03PS2) 10Giuseppe Lavagetto: tcpircbot: add the new puppetmaster frontends, remove palladium [puppet] - 10https://gerrit.wikimedia.org/r/310270 [14:02:25] (03CR) 10jenkins-bot: [V: 04-1] puppetmaster: Make puppet-merge a template [puppet] - 10https://gerrit.wikimedia.org/r/310302 (owner: 10Alexandros Kosiaris) [14:02:28] hashar: " there is still a huge spam coming from other internal services that hit the API such as the wtpXXXX boxes" [14:02:36] hashar: what kind of spam are you referring to? [14:02:46] aohh [14:02:51] that is in the "xff" log bucket [14:02:58] on fluorine.eqiad.wmnet /a/mw-logs/xff.log [14:03:21] most is insternal traffic. I have pushed a change to drop the logs coming from the jobrunner hitting hhvm on localhost [14:03:23] !log T107306 uploaded to apt.wikimedia.org jessie-wikimedia: apertium-kaz_0.1.0~r61338-1+wmf1 [14:03:28] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [14:03:32] (03CR) 10jenkins-bot: [V: 04-1] puppetmaster: Delete the post-merge hooks [puppet] - 10https://gerrit.wikimedia.org/r/310303 (owner: 10Alexandros Kosiaris) [14:03:34] then there is all the parsoid box hitting the mw api internally as well [14:03:35] (03CR) 10Alexandros Kosiaris: "recheck" [debs/contenttranslation/apertium-kaz-tat] - 10https://gerrit.wikimedia.org/r/296369 (https://phabricator.wikimedia.org/T107306) (owner: 10KartikMistry) [14:03:54] with no XFF header set , so there is no point in logging that a wptXXXX box has hit us [14:05:23] ah i see [14:05:24] ok [14:06:18] (03CR) 10Giuseppe Lavagetto: [C: 032] tcpircbot: add the new puppetmaster frontends, remove palladium [puppet] - 10https://gerrit.wikimedia.org/r/310270 (owner: 10Giuseppe Lavagetto) [14:09:42] (03PS3) 10Alexandros Kosiaris: puppetmaster: Change the backend forced ssh command [puppet] - 10https://gerrit.wikimedia.org/r/310304 [14:09:44] (03PS3) 10Alexandros Kosiaris: puppetmaster: Make puppet-merge a template [puppet] - 10https://gerrit.wikimedia.org/r/310302 [14:09:46] (03PS3) 10Alexandros Kosiaris: puppetmaster: Delete the post-merge hooks [puppet] - 10https://gerrit.wikimedia.org/r/310303 [14:09:51] (03PS1) 10Gehel: wdqs - make RWStore configuration file configureable [puppet] - 10https://gerrit.wikimedia.org/r/310307 (https://phabricator.wikimedia.org/T144380) [14:11:26] (03CR) 10jenkins-bot: [V: 04-1] puppetmaster: Make puppet-merge a template [puppet] - 10https://gerrit.wikimedia.org/r/310302 (owner: 10Alexandros Kosiaris) [14:11:34] (03PS2) 10Gehel: wdqs - make RWStore configuration file configureable [puppet] - 10https://gerrit.wikimedia.org/r/310307 (https://phabricator.wikimedia.org/T144380) [14:11:57] akosiaris: i think the operations/puppet repo is borked in Zuul :/ [14:12:01] (03CR) 10jenkins-bot: [V: 04-1] puppetmaster: Delete the post-merge hooks [puppet] - 10https://gerrit.wikimedia.org/r/310303 (owner: 10Alexandros Kosiaris) [14:12:06] error: ssh://gerrit.wikimedia.org:29418/operations/puppet did not send all necessary objects [14:12:23] (03CR) 10jenkins-bot: [V: 04-1] wdqs - make RWStore configuration file configureable [puppet] - 10https://gerrit.wikimedia.org/r/310307 (https://phabricator.wikimedia.org/T144380) (owner: 10Gehel) [14:12:33] hashar: yeah, looks like it [14:12:39] hashar: what should we do ? [14:12:52] well [14:13:01] the smart one would figure out the exact root cause and prevent it [14:13:07] or write code to workaround / autofix [14:13:15] the pragmatic me is going to nuke puppet.git entirely :] [14:14:00] (03PS1) 10Gehel: wdqs - use configuration file generated by scap [puppet] - 10https://gerrit.wikimedia.org/r/310308 (https://phabricator.wikimedia.org/T144380) [14:14:13] (03PS3) 10Gehel: wdqs - make RWStore configuration file configureable [puppet] - 10https://gerrit.wikimedia.org/r/310307 (https://phabricator.wikimedia.org/T144380) [14:14:25] (03CR) 10jenkins-bot: [V: 04-1] wdqs - use configuration file generated by scap [puppet] - 10https://gerrit.wikimedia.org/r/310308 (https://phabricator.wikimedia.org/T144380) (owner: 10Gehel) [14:14:35] (03CR) 10jenkins-bot: [V: 04-1] wdqs - make RWStore configuration file configureable [puppet] - 10https://gerrit.wikimedia.org/r/310307 (https://phabricator.wikimedia.org/T144380) (owner: 10Gehel) [14:14:42] (03CR) 10Alexandros Kosiaris: varnish: point pybal_config to all puppetmaster frontends (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/310271 (owner: 10Giuseppe Lavagetto) [14:15:41] (03CR) 10Hashar: "recheck" [puppet] - 10https://gerrit.wikimedia.org/r/310300 (owner: 10Alexandros Kosiaris) [14:16:41] (03CR) 10Hashar: "recheck" [puppet] - 10https://gerrit.wikimedia.org/r/310302 (owner: 10Alexandros Kosiaris) [14:16:54] <_joe_> akosiaris: I'll merge that change as a subsequent one fixes that [14:17:06] (03CR) 10Hashar: "recheck" [puppet] - 10https://gerrit.wikimedia.org/r/310303 (owner: 10Alexandros Kosiaris) [14:17:09] (03CR) 10Hashar: "recheck" [puppet] - 10https://gerrit.wikimedia.org/r/310308 (https://phabricator.wikimedia.org/T144380) (owner: 10Gehel) [14:17:13] (03CR) 10Hashar: "recheck" [puppet] - 10https://gerrit.wikimedia.org/r/310307 (https://phabricator.wikimedia.org/T144380) (owner: 10Gehel) [14:17:27] gehel: akosiaris: your patch should be fine in CI now. Sorry I have screwed the puppet.git repo in Zuul :( [14:17:41] hashar: thanks! I was wondering... [14:17:52] hopefully [14:17:58] (03PS1) 10Volans: Automation: improve wmf-auto-reimage [puppet] - 10https://gerrit.wikimedia.org/r/310309 (https://phabricator.wikimedia.org/T143536) [14:18:07] thanks [14:18:43] moritzm: quick improvements here ^^^, I'll add the icinga stuff later [14:18:57] (03PS4) 10Gehel: wdqs - make RWStore configuration file configureable [puppet] - 10https://gerrit.wikimedia.org/r/310307 (https://phabricator.wikimedia.org/T144380) [14:19:07] gehel: akosiaris: all set. Sorry for the mess :( [14:19:35] (03CR) 10Alexandros Kosiaris: [C: 032] apertium-kaz-tat: Rebuild for Jessie and cleanup [debs/contenttranslation/apertium-kaz-tat] - 10https://gerrit.wikimedia.org/r/296369 (https://phabricator.wikimedia.org/T107306) (owner: 10KartikMistry) [14:20:41] volans: thanks, will review in a bit [14:22:01] (03CR) 10jenkins-bot: [V: 04-1] wdqs - make RWStore configuration file configureable [puppet] - 10https://gerrit.wikimedia.org/r/310307 (https://phabricator.wikimedia.org/T144380) (owner: 10Gehel) [14:23:08] (03PS5) 10Gehel: wdqs - make RWStore configuration file configureable [puppet] - 10https://gerrit.wikimedia.org/r/310307 (https://phabricator.wikimedia.org/T144380) [14:23:49] (03PS4) 10Alexandros Kosiaris: puppetmaster: Change the backend forced ssh command [puppet] - 10https://gerrit.wikimedia.org/r/310304 [14:24:39] !logT107306 uploaded to apt.wikimedia.org jessie-wikimedia: apertium-kaz-tat_0.2.1~r57554-1+wmf1 [14:24:48] !log T107306 uploaded to apt.wikimedia.org jessie-wikimedia: apertium-kaz-tat_0.2.1~r57554-1+wmf1 [14:24:52] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [14:26:47] !log deleting test_* indices on relforge cluster [14:26:51] (03CR) 10Giuseppe Lavagetto: [C: 031] puppet-merge: Allow forcing, passing SHA1 args [puppet] - 10https://gerrit.wikimedia.org/r/310300 (owner: 10Alexandros Kosiaris) [14:26:51] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [14:29:52] (03PS1) 10Alexandros Kosiaris: apertium: Enable it on SCB [puppet] - 10https://gerrit.wikimedia.org/r/310311 [14:30:11] (03PS2) 10Muehlenhoff: Remove my own old SSH key [puppet] - 10https://gerrit.wikimedia.org/r/308523 (owner: 10Addshore) [14:30:43] (03CR) 10Giuseppe Lavagetto: [C: 04-1] "I am not sure I understand how is this irrelevant; if it is, let's just remove all this directly." (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/310301 (owner: 10Alexandros Kosiaris) [14:31:24] (03CR) 10Muehlenhoff: [C: 032] Remove my own old SSH key [puppet] - 10https://gerrit.wikimedia.org/r/308523 (owner: 10Addshore) [14:33:47] (03PS2) 10Muehlenhoff: horizon: change wikitech help URLs to use https [puppet] - 10https://gerrit.wikimedia.org/r/309704 (owner: 10Alex Monk) [14:34:30] (03CR) 10Gehel: [C: 032] wdqs - make RWStore configuration file configureable [puppet] - 10https://gerrit.wikimedia.org/r/310307 (https://phabricator.wikimedia.org/T144380) (owner: 10Gehel) [14:34:36] (03PS6) 10Gehel: wdqs - make RWStore configuration file configureable [puppet] - 10https://gerrit.wikimedia.org/r/310307 (https://phabricator.wikimedia.org/T144380) [14:35:40] (03CR) 10Muehlenhoff: [C: 032] horizon: change wikitech help URLs to use https [puppet] - 10https://gerrit.wikimedia.org/r/309704 (owner: 10Alex Monk) [14:36:01] (03PS7) 10Gehel: wdqs - make RWStore configuration file configureable [puppet] - 10https://gerrit.wikimedia.org/r/310307 (https://phabricator.wikimedia.org/T144380) [14:36:09] (03CR) 10Gehel: [V: 032] wdqs - make RWStore configuration file configureable [puppet] - 10https://gerrit.wikimedia.org/r/310307 (https://phabricator.wikimedia.org/T144380) (owner: 10Gehel) [14:38:23] (03CR) 10Giuseppe Lavagetto: [C: 04-1] "LGTM in principle, but it could be improved a bit." (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/310302 (owner: 10Alexandros Kosiaris) [14:40:22] (03CR) 10Giuseppe Lavagetto: [C: 032] varnish: point pybal_config to all puppetmaster frontends [puppet] - 10https://gerrit.wikimedia.org/r/310271 (owner: 10Giuseppe Lavagetto) [14:40:28] (03PS2) 10Giuseppe Lavagetto: varnish: point pybal_config to all puppetmaster frontends [puppet] - 10https://gerrit.wikimedia.org/r/310271 [14:40:46] (03CR) 10Giuseppe Lavagetto: [V: 032] varnish: point pybal_config to all puppetmaster frontends [puppet] - 10https://gerrit.wikimedia.org/r/310271 (owner: 10Giuseppe Lavagetto) [14:42:00] (03PS2) 10Muehlenhoff: toollabs::proxy: Restrict to labs networks [puppet] - 10https://gerrit.wikimedia.org/r/309524 [14:44:34] (03CR) 10Muehlenhoff: [C: 032] toollabs::proxy: Restrict to labs networks [puppet] - 10https://gerrit.wikimedia.org/r/309524 (owner: 10Muehlenhoff) [14:46:42] <_joe_> moritzm, elukey, volans [14:46:59] <_joe_> the change I'm merging would break the new wmf-reimage [14:47:00] _joe_: ? [14:47:11] which part? what changes? [14:47:12] <_joe_> until we change the puppetmaster [14:47:19] <_joe_> https://gerrit.wikimedia.org/r/#/c/310272 [14:47:32] <_joe_> can we live with that, or should I re-fill palladium there? [14:47:41] until when? [14:48:03] <_joe_> it would break the new script until: 1) it starts using the SRV record 2) we switch that to puppetmaster1001 [14:48:13] <_joe_> which should be tomorrow morning at worst [14:48:16] 06Operations, 10Pybal, 06Services, 15User-mobrovac: Depool / repool scripts execute successfully even when the host has not been (r|d)epooled - https://phabricator.wikimedia.org/T145518#2632995 (10mobrovac) [14:48:21] <_joe_> if you're using it now [14:48:24] sure, totally fine from my pov [14:48:39] same thing from me [14:48:44] ok for me, I was adding the SRV anyway [14:49:12] do we have SRV also for icinga and deployment? [14:50:03] <_joe_> volans: what do you mean? [14:50:07] <_joe_> volans: nope [14:50:28] if we have SRV for deployment.eqiad.wmnet and icinga.wikimedia.org [14:50:35] !log drain and reboot restbase2004 [14:50:41] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [14:51:20] <_joe_> volans: nope [14:51:25] ok [14:51:41] (03CR) 10Giuseppe Lavagetto: [C: 032] puppetmaster: Allow new puppetmasters to work with wmf-reimage [puppet] - 10https://gerrit.wikimedia.org/r/310272 (owner: 10Giuseppe Lavagetto) [14:51:48] (03PS2) 10Giuseppe Lavagetto: puppetmaster: Allow new puppetmasters to work with wmf-reimage [puppet] - 10https://gerrit.wikimedia.org/r/310272 [14:52:40] 06Operations, 10Pybal, 06Services, 15User-mobrovac: Depool / repool scripts execute successfully even when the host has not been (r|d)epooled - https://phabricator.wikimedia.org/T145518#2633048 (10mobrovac) [14:57:31] (03PS2) 10Filippo Giunchedi: swift: enable thumbor on commons [puppet] - 10https://gerrit.wikimedia.org/r/310282 (https://phabricator.wikimedia.org/T139606) [14:57:57] (03CR) 10Giuseppe Lavagetto: "yeah it was just the next logical step; testing on a few hosts right away :)" [puppet] - 10https://gerrit.wikimedia.org/r/310273 (owner: 10Giuseppe Lavagetto) [14:58:59] (03CR) 10Filippo Giunchedi: [C: 032] swift: enable thumbor on commons [puppet] - 10https://gerrit.wikimedia.org/r/310282 (https://phabricator.wikimedia.org/T139606) (owner: 10Filippo Giunchedi) [15:00:04] marxarelli and jynus: Dear anthropoid, the time has come. Please deploy Upgrade MariaDB on beta cluster (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20160913T1500). [15:01:49] hi, marxarelli [15:02:15] (03PS1) 10Giuseppe Lavagetto: base::puppet: use srv records for running puppet [puppet] - 10https://gerrit.wikimedia.org/r/310318 [15:02:23] (03PS2) 10Gehel: wdqs - use configuration file generated by scap [puppet] - 10https://gerrit.wikimedia.org/r/310308 (https://phabricator.wikimedia.org/T144380) [15:02:31] hey jynus [15:03:34] 06Operations, 10Analytics: kafkatee's logrotate/syslog default pkg files needs to be removed - https://phabricator.wikimedia.org/T145490#2633102 (10elukey) p:05Triage>03Low [15:04:00] so the new db instances should be good to go. i've disabled puppet on db1 and db2 just so the cherry-picked puppet patch doesn't mess anything up [15:04:09] 06Operations, 10Ops-Access-Requests, 13Patch-For-Review: Sudo access to the Analytics Druid cluster for the Analytics team - https://phabricator.wikimedia.org/T144726#2633106 (10elukey) 05Open>03Resolved [15:04:44] (03CR) 10Giuseppe Lavagetto: [C: 032] base::puppet: use srv records for running puppet [puppet] - 10https://gerrit.wikimedia.org/r/310318 (owner: 10Giuseppe Lavagetto) [15:07:13] jynus: mind if we do a hangout for the migration so i can learn some of your dba skills? :) [15:07:52] marxarelli, we can do the hangout [15:07:56] (03PS1) 10Filippo Giunchedi: thumbor: fix wikipedia vs wikimedia for commons [puppet] - 10https://gerrit.wikimedia.org/r/310320 [15:08:08] now, I do not think there will be any skills to learn :-) [15:08:29] I just push random buttons until it works [15:08:48] haha, i'm could always use a refresher on random button pushing [15:08:52] *i could* [15:08:57] apparently [15:09:32] for starters I cannot log in to "ssh deployment-db3.eqiad.wmflabs" [15:09:43] ah, it's db03 [15:09:47] ah! [15:10:34] ok, I am root on all of them now, we can start [15:10:52] marxarelli, are you ok with marostegui joining us? [15:11:03] sure thing [15:13:14] thanks [15:13:15] (03CR) 10Filippo Giunchedi: [C: 032] thumbor: fix wikipedia vs wikimedia for commons [puppet] - 10https://gerrit.wikimedia.org/r/310320 (owner: 10Filippo Giunchedi) [15:14:23] (03PS1) 10Giuseppe Lavagetto: base::puppet::params: lookup use_srv_record on hiera. [puppet] - 10https://gerrit.wikimedia.org/r/310322 [15:16:18] 06Operations, 10Cassandra, 06Services: restbase2004.codfw.wmnet data corruption - https://phabricator.wikimedia.org/T144826#2633125 (10fgiunchedi) restbase2004 rebooted now, back up and all instances started, though no errors reported [15:16:21] (03CR) 10Giuseppe Lavagetto: [C: 032] base::puppet::params: lookup use_srv_record on hiera. [puppet] - 10https://gerrit.wikimedia.org/r/310322 (owner: 10Giuseppe Lavagetto) [15:16:26] (03PS2) 10Giuseppe Lavagetto: base::puppet::params: lookup use_srv_record on hiera. [puppet] - 10https://gerrit.wikimedia.org/r/310322 [15:16:36] (03CR) 10Giuseppe Lavagetto: [V: 032] base::puppet::params: lookup use_srv_record on hiera. [puppet] - 10https://gerrit.wikimedia.org/r/310322 (owner: 10Giuseppe Lavagetto) [15:18:38] !log starting 2-hour read-only maintenance window for beta cluster migration [15:18:44] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [15:21:38] (03PS1) 10Andrew Bogott: puppetpanel: Add a 'Project Puppet' panel [puppet] - 10https://gerrit.wikimedia.org/r/310325 (https://phabricator.wikimedia.org/T91990) [15:22:25] (03PS2) 10Andrew Bogott: puppetpanel: Add a 'Project Puppet' panel [puppet] - 10https://gerrit.wikimedia.org/r/310325 (https://phabricator.wikimedia.org/T91990) [15:26:56] PROBLEM - puppet last run on mira is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 6 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/usr/local/sbin/install-pkg-src] [15:27:45] (03PS3) 10Andrew Bogott: puppetpanel: Add a 'Project Puppet' panel [puppet] - 10https://gerrit.wikimedia.org/r/310325 (https://phabricator.wikimedia.org/T91990) [15:32:25] PROBLEM - puppet last run on gallium is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:38:11] _joe_, akosiaris: ^^^ related to your work? [15:38:17] (03PS1) 10Giuseppe Lavagetto: Point eqiad SRV record for puppet agents to puppetmaster1001 [dns] - 10https://gerrit.wikimedia.org/r/310327 [15:38:17] <_joe_> yes [15:38:25] <_joe_> volans: this is the fix (I think) [15:38:25] ok [15:39:05] (03CR) 10Giuseppe Lavagetto: [C: 032] Point eqiad SRV record for puppet agents to puppetmaster1001 [dns] - 10https://gerrit.wikimedia.org/r/310327 (owner: 10Giuseppe Lavagetto) [15:40:40] <_joe_> volans: the gallium failure, not the mira one [15:40:51] !log shutting down relforge cluster for indices cleanup [15:40:56] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [15:41:29] yep, I was referring to gallium [15:41:33] mira one looks unrelated [15:41:54] <_joe_> gallium is fixed [15:42:37] RECOVERY - puppet last run on gallium is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [15:44:05] !log setting deployment-db1 and deployment-db1 mysqls in read only mode [15:44:10] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [15:46:11] !log test_* indices remvoed on relforge cluster, cluster restarted [15:46:15] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [15:49:04] RECOVERY - ElasticSearch health check for shards on relforge1001 is OK: OK - elasticsearch status relforge-eqiad: status: green, number_of_nodes: 2, unassigned_shards: 0, number_of_pending_tasks: 0, number_of_in_flight_fetch: 0, timed_out: False, active_primary_shards: 91, task_max_waiting_in_queue_millis: 0, cluster_name: relforge-eqiad, relocating_shards: 0, active_shards_percent_as_number: 100.0, active_shards: 117, initializing_sh [15:49:15] RECOVERY - ElasticSearch health check for shards on relforge1002 is OK: OK - elasticsearch status relforge-eqiad: status: green, number_of_nodes: 2, unassigned_shards: 0, number_of_pending_tasks: 0, number_of_in_flight_fetch: 0, timed_out: False, active_primary_shards: 91, task_max_waiting_in_queue_millis: 0, cluster_name: relforge-eqiad, relocating_shards: 0, active_shards_percent_as_number: 100.0, active_shards: 117, initializing_sh [15:50:16] RECOVERY - puppet last run on mira is OK: OK: Puppet is currently enabled, last run 55 seconds ago with 0 failures [15:50:39] issues on s5? [15:50:58] did it went fully down? [15:52:01] db1082 [15:52:06] PROBLEM - dhclient process on db1082 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [15:52:20] network or did it crash? [15:52:32] PROBLEM - mysqld processes on db1082 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [15:52:35] jynus: ping if you need help, taking a quick look [15:52:36] jynus: it responds to ping [15:52:50] cannot ssh [15:52:59] nope [15:53:04] <_joe_> jynus: let me try console [15:53:09] * apergos peeks in [15:53:18] 06Operations, 06Performance-Team, 10Thumbor: AttributeError: 'Engine' object has no attribute 'exif' - https://phabricator.wikimedia.org/T145504#2633354 (10Gilles) Possibly coming from tiffs? ``` Sep 13 15:49:38 thumbor1001 thumbor@8838[105439]: thumbor:ERROR Ignored error handling exif for reorientation S... [15:53:35] kernel panic, _jo_ [15:53:45] yay :( [15:53:53] either kernel or controller failure [15:53:58] <_joe_> jynus: ok so I'll let you deal with it :( [15:53:58] based on previous issues [15:54:11] The graphs do not show any peak or anything right before it (or it was too late to caught it) [15:54:11] <_joe_> jynus: should I depool it now? [15:54:23] _joe_, it is depooled automatically [15:54:41] let's make sure it doesn't reppol by forcing it on config [15:55:17] _joe_, can you check efectively no signitificative 5XX [15:55:24] while I prepare the patch [15:55:25] PROBLEM - puppet last run on db1082 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [15:55:48] PROBLEM - MariaDB Slave IO: s5 on db1082 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [15:55:49] PROBLEM - SSH on db1082 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [15:55:58] PROBLEM - HP RAID on db1082 is CRITICAL: CHECK_NRPE: Socket timeout after 40 seconds. [15:56:02] * volans silencing ^^^ [15:56:09] PROBLEM - salt-minion processes on db1082 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [15:56:23] <_joe_> jynus: I confirm no impact on 5xx [15:56:28] PROBLEM - Check size of conntrack table on db1082 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [15:56:29] <_joe_> no significant one, at least [15:56:31] _joe_, good, load balanced worked [15:56:40] 2h downtime on icinga set [15:56:40] because it is one of the large servers [15:56:47] volans, we did it at the same time [15:56:58] let's disable alerts too to avoid the up paging [15:57:06] ok i'll take care of it [15:57:07] volans, if you want to do that [15:57:13] while I do the patch [15:57:23] hey [15:57:44] everything ok AIUI? [15:57:52] hi paravoid [15:58:02] notification disabled jynus [15:58:12] <_joe_> paravoid: good morning :) [15:58:16] <_joe_> yes, all ok [15:58:26] <_joe_> well, for some values of ok, ofc [15:58:33] (03PS1) 10Jcrespo: Depool db1082 (crashed) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/310335 [15:58:36] <_joe_> a db just crashed, but no user impact AFAICT [15:58:41] we could literally did nothing [15:58:48] and we would have been ok [15:58:55] thanks to the load balancer [15:59:10] this is just to avoid future flopping and investigate [15:59:28] marostegui, +1 https://gerrit.wikimedia.org/r/#/c/310335/ [15:59:31] ? [15:59:45] (03CR) 10Marostegui: [C: 031] Depool db1082 (crashed) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/310335 (owner: 10Jcrespo) [15:59:50] jynus: done [15:59:55] (03CR) 10Jcrespo: [C: 032] Depool db1082 (crashed) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/310335 (owner: 10Jcrespo) [16:00:04] godog, moritzm, and _joe_: Dear anthropoid, the time has come. Please deploy Puppet SWAT(Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20160913T1600). [16:00:04] Krenair, bd808, and Addshore: A patch you scheduled for Puppet SWAT(Max 8 patches) is about to be deployed. Please be available during the process. [16:00:10] (03PS1) 10Giuseppe Lavagetto: Point the SRV record to puppetmaster1001 everywhere [dns] - 10https://gerrit.wikimedia.org/r/310336 [16:00:16] o/ [16:00:17] <_joe_> I'm puppet-swatting [16:00:18] my patch has already been merged! [16:02:13] jynus: I tried also with salt, no success [16:02:19] <_joe_> Krenair: around? I'm looking at https://gerrit.wikimedia.org/r/#/c/309337 [16:02:29] !log jynus@tin Synchronized wmf-config/db-eqiad.php: Depool db1082 (duration: 01m 00s) [16:02:33] hi [16:02:35] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [16:02:40] volans, no, I can see panicking really well from console [16:02:40] <_joe_> and I have a few stylistic issues with it, but we can fix those at a later time [16:02:50] I would have a look at IPMI logs on reboot [16:02:55] ok [16:03:01] ping if you need anything [16:03:16] <_joe_> the patch in itself is ok, but please use argparse and not if users[0] == '--wikitext' :P [16:03:16] only significative errors are when it crashed, until the load balancer is aware [16:03:46] I am going to force a reboot, but not start mysql or anything [16:04:14] 06Operations, 10Analytics-Cluster: Migrate titanium to jessie (archiva.wikimedia.org upgrade) - https://phabricator.wikimedia.org/T123725#1936502 (10Dzahn) Cool, i'll take it as a reminder to shut titanium down after a waiting period. [16:04:18] (03CR) 10Giuseppe Lavagetto: [C: 032] "Merging because the patch is correct and useful, but please fix the argument parsing to do it properly." [puppet] - 10https://gerrit.wikimedia.org/r/309337 (owner: 10Alex Monk) [16:04:25] 06Operations, 10Analytics-Cluster: Migrate titanium to jessie (archiva.wikimedia.org upgrade) - https://phabricator.wikimedia.org/T123725#2633405 (10Dzahn) a:03Dzahn [16:04:26] (03PS3) 10Giuseppe Lavagetto: admin: allow matrix.py to output a wikitext table [puppet] - 10https://gerrit.wikimedia.org/r/309337 (owner: 10Alex Monk) [16:04:30] Okay, I'll make it use argparse [16:04:40] thanks [16:04:44] <_joe_> Krenair: I know I can trust you on that ;) [16:04:51] _joe_: thanks for puppet swat today! [16:04:51] (03CR) 10Giuseppe Lavagetto: [V: 032] admin: allow matrix.py to output a wikitext table [puppet] - 10https://gerrit.wikimedia.org/r/309337 (owner: 10Alex Monk) [16:05:33] <_joe_> Krenair: the horizon one has already been merged I see [16:06:22] yep, moritzm did that earlier (thanks!) [16:06:22] (03CR) 10Giuseppe Lavagetto: [C: 032] hiera_lookup util: add support for labtest realm, fix check for labs [puppet] - 10https://gerrit.wikimedia.org/r/309685 (owner: 10Alex Monk) [16:06:34] (03PS3) 10Giuseppe Lavagetto: hiera_lookup util: add support for labtest realm, fix check for labs [puppet] - 10https://gerrit.wikimedia.org/r/309685 (owner: 10Alex Monk) [16:06:38] (03CR) 10Giuseppe Lavagetto: [V: 032] hiera_lookup util: add support for labtest realm, fix check for labs [puppet] - 10https://gerrit.wikimedia.org/r/309685 (owner: 10Alex Monk) [16:06:44] PROBLEM - HHVM jobrunner on mw1162 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [16:06:52] !log power resetting db1082 [16:06:55] <_joe_> can someone look into mw1162? [16:06:57] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [16:08:21] I can take a look [16:09:55] <_joe_> bd808: uhm I don't have any idea what's the context of your patch [16:09:57] marxarelli, marostegui, I say we continue with labs maintenance, so at least we leave it in a proper state, even if it cannot finish [16:10:13] jynus: sounds good to me [16:10:27] <_joe_> to be honest, I don't even know where those modules are used [16:10:31] I think we still will be able to do it [16:10:34] jynus: sounds good [16:10:36] _joe_: its only used in one Labs project. Brand new module and role as of last week [16:10:40] <_joe_> Krenair: your patches are live [16:10:48] thanks _joe_ [16:10:54] <_joe_> bd808: ok then, I'll merge becuase "we don't care" :P [16:11:01] :) awesome [16:11:07] <_joe_> and the code seemed correct to me as 1-km-view [16:11:26] (03CR) 10Giuseppe Lavagetto: [C: 032] external_proxy: Respect XFF headers [puppet] - 10https://gerrit.wikimedia.org/r/308780 (https://phabricator.wikimedia.org/T144290) (owner: 10BryanDavis) [16:11:33] (03PS3) 10Giuseppe Lavagetto: external_proxy: Respect XFF headers [puppet] - 10https://gerrit.wikimedia.org/r/308780 (https://phabricator.wikimedia.org/T144290) (owner: 10BryanDavis) [16:12:08] <_joe_> for this kind of cases ^^ if we had a labs environment where bd808 had merge rights, he won't be needing me as an executor [16:12:12] (03CR) 10Giuseppe Lavagetto: [V: 032] external_proxy: Respect XFF headers [puppet] - 10https://gerrit.wikimedia.org/r/308780 (https://phabricator.wikimedia.org/T144290) (owner: 10BryanDavis) [16:12:23] true enough [16:13:11] godog: puppetswat still on? [16:13:16] <_joe_> mobrovac: NO [16:13:18] <_joe_> ahahahahah [16:13:31] <_joe_> like you usually wait for it :P [16:13:41] i do too! [16:13:59] _joe_: you on puppetswat today? [16:14:40] <_joe_> yes [16:14:55] 06Operations, 10DBA: Investigate db1082 crash - https://phabricator.wikimedia.org/T145533#2633433 (10Marostegui) [16:15:54] <_joe_> mobrovac: what would you want to merge? [16:16:14] _joe_: i need a coordinated code/config deploy for gerrit 309486 [16:16:20] adding it to the wikipage now [16:16:37] <_joe_> mobrovac: not really puppetswat material then... [16:16:47] sigh [16:16:51] <_joe_> don't add it to the page, we'll do it in... 15? [16:17:02] marxarelli, we are on the hangout again working on the beta maintenance [16:17:05] !log T144826: Removing compaction rate limit, increasing compactor threads (from 10 to 20), and beginning scrub of local_group_wikipedia_T_parsoid_html.data (restbase2004-b.codfw.wmnet) [16:17:07] T144826: restbase2004.codfw.wmnet data corruption - https://phabricator.wikimedia.org/T144826 [16:17:07] _joe_: kk [16:17:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [16:17:13] (in case you want to join us) [16:21:45] (03PS1) 10Alex Monk: admin: use argparse in matrix.py [puppet] - 10https://gerrit.wikimedia.org/r/310344 [16:23:10] (03CR) 10Dzahn: "please edit the commit message to make clear what is being worked on (gerrit? diffusion? gitblit redirects?)" [puppet] - 10https://gerrit.wikimedia.org/r/308885 (https://phabricator.wikimedia.org/T137354) (owner: 10Paladox) [16:24:13] !log dump hhvm backtrace on mw1162 and restart hhvm, apache gets connection refused [16:24:20] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [16:24:40] RECOVERY - HHVM jobrunner on mw1162 is OK: HTTP OK: HTTP/1.1 200 OK - 222 bytes in 0.004 second response time [16:25:05] <_joe_> Krenair: why you removed HEADER_END? [16:25:36] !log change-prop deploying d701a69 [16:25:40] <_joe_> it will not print a newline in the wikitext case, which I thought was making the output clearer? [16:25:40] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [16:26:37] _joe_, I noticed it seemed to be redundant [16:26:40] (03PS2) 10Giuseppe Lavagetto: Add security header filters [puppet] - 10https://gerrit.wikimedia.org/r/309486 (owner: 10GWicke) [16:26:49] it was just adding an extra newline in the output [16:27:05] <_joe_> Krenair: ok, will merge after I'm done with mobrovac [16:27:16] ok [16:27:18] <_joe_> mobrovac: ok to merge/run puppet? [16:27:35] yup _joe_, i'll deploy after that [16:27:58] (03CR) 10Giuseppe Lavagetto: [C: 032] Add security header filters [puppet] - 10https://gerrit.wikimedia.org/r/309486 (owner: 10GWicke) [16:29:52] 06Operations, 10DBA: Investigate db1082 crash - https://phabricator.wikimedia.org/T145533#2633472 (10Marostegui) @jcrespo saw a kernel panic when he logged via console. At a quick glance: Also the server has been showing kernel errors lately (for the last few days: ``` Sep 13 11:06:17 db1082 kernel: [888031... [16:30:05] <_joe_> mobrovac: running puppet, I'll tell you when I'm done [16:30:21] kk [16:31:09] _joe_, btw you mentioned conftool isn't working in beta like it is in prod - I don't know much about that tool but I'm going to see what I can do [16:32:08] (03PS6) 10Paladox: Fix broken refs/meta/config diffusion links in access section [puppet] - 10https://gerrit.wikimedia.org/r/308885 (https://phabricator.wikimedia.org/T137354) [16:32:14] (03PS7) 10Paladox: Fix broken refs/meta/config diffusion links in access section [puppet] - 10https://gerrit.wikimedia.org/r/308885 (https://phabricator.wikimedia.org/T137354) [16:32:52] <_joe_> Krenair: it just can't, at least not for poolingdepooling hosts [16:33:01] <_joe_> but I'll be happy to help setting it up there [16:33:17] <_joe_> actually, I should do it [16:33:40] <_joe_> the issue is, some of the things we use it for in prod cannot very easily be reproduced in labs [16:33:55] <_joe_> that is listing varnish backends, and lvs pools, and dsh groups [16:34:06] <_joe_> I guess we could do it for varnish if we want to [16:34:29] <_joe_> it would almost be as it is in prod [16:34:43] <_joe_> well, not really, but still [16:35:04] <_joe_> mobrovac: done; with the exception of rb2004 where puppet is disabled [16:35:04] 06Operations, 06Performance-Team, 10Thumbor: 'NoneType' object has no attribute 'lstrip' - https://phabricator.wikimedia.org/T145505#2633480 (10Gilles) I'm not sure why those fail in production, when I download those files and try them locally, they're correctly detected as SVG. [16:35:13] <_joe_> but I guess that's depooled anyways, right? [16:35:20] <_joe_> let me check [16:35:45] it should be back in the pool, it's cassadra there, rb is working fine [16:35:57] <_joe_> ok so we have a problem [16:36:09] <_joe_> can I reenable puppet there? [16:36:28] _joe_: puppet isn't disabled on 2004 [16:36:40] it is on 1013 tho [16:36:49] oh? [16:36:51] why? [16:36:56] oh your tests [16:37:00] <_joe_> yeah sorry read the output backwards [16:37:01] hmmm [16:37:04] to preserve a hacked-in GC setting we're testing [16:37:16] ok, i'll manully copy the rb config there [16:37:29] mobrovac: we can reenable [16:37:39] <_joe_> ok [16:37:41] mobrovac: we can reenable [16:37:43] grrr [16:37:52] <_joe_> can you do that guys? [16:38:01] yup, _joe_ i'll do it [16:38:03] we can reenable, and then i can disable and re-hack it after, as long as we don't restart Cassandra [16:38:36] _joe_: sorry, i know you hate it when we disable puppet; don't really know what else to do in such situations :) [16:38:43] (03CR) 10Giuseppe Lavagetto: [C: 032] admin: use argparse in matrix.py [puppet] - 10https://gerrit.wikimedia.org/r/310344 (owner: 10Alex Monk) [16:38:46] seems the lesser of evils [16:38:48] (03PS2) 10Giuseppe Lavagetto: admin: use argparse in matrix.py [puppet] - 10https://gerrit.wikimedia.org/r/310344 (owner: 10Alex Monk) [16:39:06] urandom: running puppet on rb1013 now, want me to disable it right after? [16:39:16] mobrovac: just ping me [16:39:19] k [16:39:20] <_joe_> urandom: structure our manifests so that it's relatively easy to hack in GC settings in puppet too [16:39:24] urandom: ping, {{done}} [16:39:24] lol [16:39:25] mobrovac: i need to re-hack the conf anyway [16:39:31] <_joe_> whenever we need to [16:39:32] _joe_, we have dsh groups and varnish in labs, just not lvs [16:39:37] <_joe_> but that's for another day [16:39:49] _joe_: except it's really not something you'd want to change on a host-by-host basis anyway [16:40:16] <_joe_> urandom: you want to be able to do it for testing, anyways, this is matter of another day :) [16:40:17] other than these exceptional cases [16:40:24] k [16:40:38] 06Operations, 06Performance-Team, 10Thumbor: AttributeError: 'Engine' object has no attribute 'exif' - https://phabricator.wikimedia.org/T145504#2633495 (10Gilles) Unfortunately this one comes without context telling us which file is affected. [16:41:11] (03CR) 10Giuseppe Lavagetto: [V: 032] admin: use argparse in matrix.py [puppet] - 10https://gerrit.wikimedia.org/r/310344 (owner: 10Alex Monk) [16:41:57] !log restbase deploy start of d10d759 [16:42:00] (03PS1) 10Gehel: maps - fix witespace issue in notify-tilerator [puppet] - 10https://gerrit.wikimedia.org/r/310350 [16:42:02] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [16:42:17] (03Abandoned) 10Giuseppe Lavagetto: wmflib: add map_resources function [puppet] - 10https://gerrit.wikimedia.org/r/308705 (owner: 10Giuseppe Lavagetto) [16:43:05] (03CR) 10Giuseppe Lavagetto: [C: 032] Point the SRV record to puppetmaster1001 everywhere [dns] - 10https://gerrit.wikimedia.org/r/310336 (owner: 10Giuseppe Lavagetto) [16:45:57] 06Operations, 06Performance-Team, 10Thumbor: AttributeError: 'Engine' object has no attribute 'exif' - https://phabricator.wikimedia.org/T145504#2633510 (10Gilles) The gifsicle engine maybe? Anyway the exception is harmless because it's caught, but this creates needless noise. [16:46:04] 06Operations, 06Performance-Team, 10Thumbor: AttributeError: 'Engine' object has no attribute 'exif' - https://phabricator.wikimedia.org/T145504#2633511 (10Gilles) p:05Normal>03Low [16:47:09] PROBLEM - cassandra-b CQL 10.192.32.138:9042 on restbase2004 is CRITICAL: Connection refused [16:48:47] ACKNOWLEDGEMENT - cassandra-b CQL 10.192.32.138:9042 on restbase2004 is CRITICAL: Connection refused eevans Wack-a-mole continues (https://phabricator.wikimedia.org/T144826) [16:49:49] (03PS1) 10Chad: Remove leftover include [mediawiki-config] - 10https://gerrit.wikimedia.org/r/310351 [16:50:59] (03CR) 10Chad: [C: 032] Remove leftover include [mediawiki-config] - 10https://gerrit.wikimedia.org/r/310351 (owner: 10Chad) [16:51:26] (03Merged) 10jenkins-bot: Remove leftover include [mediawiki-config] - 10https://gerrit.wikimedia.org/r/310351 (owner: 10Chad) [16:52:24] (03PS5) 10Dzahn: Workaround a bug in gerrit on Microsoft Edge [puppet] - 10https://gerrit.wikimedia.org/r/309385 (https://phabricator.wikimedia.org/T145130) (owner: 10Paladox) [16:53:09] !log demon@tin Synchronized multiversion/updateWikiversions: unbreak myself (duration: 00m 48s) [16:53:09] (03PS6) 10Paladox: gerrit: workaround a CSS bug with Microsoft Edge [puppet] - 10https://gerrit.wikimedia.org/r/309385 (https://phabricator.wikimedia.org/T145130) [16:53:13] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [16:53:54] ostriches! can I squeeze this one into the branch? :P https://gerrit.wikimedia.org/r/#/c/310338/ [16:54:11] (03CR) 10Dzahn: [C: 032] "alright, saw the diff in screenshots" [puppet] - 10https://gerrit.wikimedia.org/r/309385 (https://phabricator.wikimedia.org/T145130) (owner: 10Paladox) [16:54:19] Thanks ^^ [16:54:28] !log restbase deploy end of d10d759 [16:54:31] _joe_: ^^ [16:54:32] done [16:54:32] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [16:55:38] addshore: Gimme a few mins to finish the branching so it'll get auto-merged, but yes. [16:56:05] awesome, I was minutes too late again ;) I CPed to https://gerrit.wikimedia.org/r/#/c/310352 [16:59:34] !log beta dbs back in rw mode [16:59:39] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [17:00:04] yurik, gwicke, cscott, arlolra, subbu, halfak, and Amir1: Respected human, time to deploy Services – Graphoid / Parsoid / OCG / Citoid / ORES (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20160913T1700). Please do the needful. [17:01:06] okay ostriches the patch is https://gerrit.wikimedia.org/r/#/c/310352 and I have given it a +1 (the one on master also has a +1) but now I have to dash!! If it gets missed its not too bad and I can swat it at some point! [17:01:12] tata for now! [17:02:31] mutante: thoughts on https://gerrit.wikimedia.org/r/#/c/309995/ ? IIRC you last reimaged bast3001 ? [17:04:18] (03PS2) 10Gehel: maps - fix witespace issue in notify-tilerator [puppet] - 10https://gerrit.wikimedia.org/r/310350 [17:06:05] (03CR) 10Gehel: [C: 032] maps - fix witespace issue in notify-tilerator [puppet] - 10https://gerrit.wikimedia.org/r/310350 (owner: 10Gehel) [17:06:30] PROBLEM - puppet last run on db2049 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:08:26] (03PS2) 10Volans: Automation: improve wmf-auto-reimage [puppet] - 10https://gerrit.wikimedia.org/r/310309 (https://phabricator.wikimedia.org/T143536) [17:09:47] !log kill stuck notification of tilerator on maps1001.eqiad.wmnet [17:11:17] godog: it seems good, _if_ that partman recipe works for bast3001 [17:11:23] but i dont know if it does [17:11:42] note the other hosts in esams all use the same one [17:11:51] and are of the same hardware type [17:12:21] eeden,maerlant etc [17:12:49] (03PS1) 10Chad: group0 to wmf.19 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/310354 [17:14:09] 06Operations, 10DBA: Investigate db1082 crash - https://phabricator.wikimedia.org/T145533#2633623 (10jcrespo) I am adding Moritz, not expecting to do anything here, but just a heads up incase he is aware of any recent kernel issue and we are behind in updates for this server. Let's do proper debugging tomorrow. [17:14:43] mutante: indeed, ok I'm fairly sure the recipe itself given the situation (just two disks, sw raid), aside from that there's anything in /srv to be salvaged? [17:15:07] !log demon@tin Started scap: testwiki to wmf.19 + l10n bootstrap [17:15:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [17:15:51] (03CR) 10Dzahn: [C: 031] "the provided link is "403 Forbidden" for non-admins, but http://php.net/manual/en/ini.core.php#ini.always-populate-raw-post-data confirms" [puppet] - 10https://gerrit.wikimedia.org/r/309214 (owner: 1020after4) [17:17:37] godog: the /srv/tftp* stuff comes from the puppet role installserver::tftp_server . so no, doesnt have to be salvaged [17:18:06] godog: but /home should probably be saved and restored [17:19:48] mutante: heh, is there a recommended way ? I'm aware of either backups or rsync [17:21:22] (03CR) 10Muehlenhoff: [C: 031] "Looks good to me" [puppet] - 10https://gerrit.wikimedia.org/r/310309 (https://phabricator.wikimedia.org/T143536) (owner: 10Volans) [17:21:39] (03PS4) 10Andrew Bogott: puppetpanel: Add a 'Project Puppet' panel [puppet] - 10https://gerrit.wikimedia.org/r/310325 (https://phabricator.wikimedia.org/T91990) [17:21:54] (03PS8) 10Paladox: Fix broken refs/meta/config diffusion links in access section [puppet] - 10https://gerrit.wikimedia.org/r/308885 (https://phabricator.wikimedia.org/T137354) [17:21:58] godog: yea, that's the 2 ways. i'd do rsync but actually.. restore from bacula should be easier, since it's already backed up [17:21:58] (03PS9) 10Paladox: Fix broken refs/meta/config diffusion links in access section [puppet] - 10https://gerrit.wikimedia.org/r/308885 (https://phabricator.wikimedia.org/T137354) [17:22:36] godog: so "bconsole" on helium and restore .. eh.. wait . it's just 8M :p [17:23:06] hehe indeed, ok thanks mutante I'll take another look tomorrow [17:24:05] (03CR) 10Andrew Bogott: [C: 032] puppetpanel: Add a 'Project Puppet' panel [puppet] - 10https://gerrit.wikimedia.org/r/310325 (https://phabricator.wikimedia.org/T91990) (owner: 10Andrew Bogott) [17:24:22] godog: looks like nobody has anything in home, except maybe a few dot files, guess it doesnt even matter [17:24:31] !log demon@tin scap failed: CalledProcessError Command '/usr/local/bin/mwscript rebuildLocalisationCache.php --wiki="testwiki" --outdir="/tmp/scap_l10n_4282891950" --threads=4 --lang en --quiet' returned non-zero exit status 1 (duration: 09m 23s) [17:24:35] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [17:24:48] new extension? [17:25:03] Ummm.... [17:25:06] (03CR) 10Dzahn: [C: 031] install_server: use separate /srv for bastions [puppet] - 10https://gerrit.wikimedia.org/r/309995 (owner: 10Filippo Giunchedi) [17:25:20] A copy of your installation's LocalSettings.php must exist and be readable in the source directory. Use --conf to specify it. [17:25:22] .... [17:25:24] wot? [17:25:27] urandom: you there? [17:26:50] (03CR) 10Dzahn: "as long as this partman recipe works with the bast3001 hardware this looks good. (it uses the current one because all the other esams host" [puppet] - 10https://gerrit.wikimedia.org/r/309995 (owner: 10Filippo Giunchedi) [17:26:57] Ah dangit, I see what I did. [17:28:35] (03CR) 1020after4: "Please merge." [puppet] - 10https://gerrit.wikimedia.org/r/309214 (owner: 1020after4) [17:30:31] urandom: I am going afk but we can deploy the cassandra logging change tomorrow if you have time [17:30:34] let me know [17:32:00] RECOVERY - puppet last run on db2049 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [17:33:56] (03PS3) 10Volans: Automation: improve wmf-auto-reimage [puppet] - 10https://gerrit.wikimedia.org/r/310309 (https://phabricator.wikimedia.org/T143536) [17:35:28] 06Operations, 10ChangeProp, 10DBA, 10MediaWiki-API, and 4 others: Investigate slow transcludedin query - https://phabricator.wikimedia.org/T145079#2633741 (10jcrespo) I regenerated the table statistics without sucess, I will try to tune the query planner next forcing the usage of histograms. ``` MariaDB d... [17:36:51] !log demon@tin Started scap: testwiki to wmf.19 + l10n bootstrap (try 2) [17:36:56] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [17:40:03] (03PS11) 10Dduvall: beta: Create and mount LVM volumes for mariadb [puppet] - 10https://gerrit.wikimedia.org/r/305668 (https://phabricator.wikimedia.org/T138778) [17:40:05] (03PS1) 10Dduvall: beta: Install MariaDB 10 [puppet] - 10https://gerrit.wikimedia.org/r/310360 (https://phabricator.wikimedia.org/T138778) [17:40:33] (03PS2) 10Dzahn: phabricator php.ini: set always_populate_raw_post_data = -1 [puppet] - 10https://gerrit.wikimedia.org/r/309214 (owner: 1020after4) [17:40:37] (03CR) 10Dzahn: [C: 032] phabricator php.ini: set always_populate_raw_post_data = -1 [puppet] - 10https://gerrit.wikimedia.org/r/309214 (owner: 1020after4) [17:43:55] 06Operations, 10Ops-Access-Requests, 13Patch-For-Review: Access to people.wikimedia.org for Volker_E - https://phabricator.wikimedia.org/T143465#2633782 (10Dzahn) @Volker_E ping? [17:44:53] (03CR) 10Jcrespo: [C: 031] "This is right, just it cannot be applied until the migration is done, or it may have unintended consequences." [puppet] - 10https://gerrit.wikimedia.org/r/310360 (https://phabricator.wikimedia.org/T138778) (owner: 10Dduvall) [17:46:11] (03CR) 10Volans: [C: 032] Automation: improve wmf-auto-reimage [puppet] - 10https://gerrit.wikimedia.org/r/310309 (https://phabricator.wikimedia.org/T143536) (owner: 10Volans) [17:46:21] (03PS4) 10Volans: Automation: improve wmf-auto-reimage [puppet] - 10https://gerrit.wikimedia.org/r/310309 (https://phabricator.wikimedia.org/T143536) [17:46:24] (03CR) 10Volans: [V: 032] Automation: improve wmf-auto-reimage [puppet] - 10https://gerrit.wikimedia.org/r/310309 (https://phabricator.wikimedia.org/T143536) (owner: 10Volans) [17:51:03] 06Operations, 10ops-codfw: rack and initial configuration of wtp2001-2020 - https://phabricator.wikimedia.org/T86807#2633801 (10Dzahn) wtp2019 is reported as down since about 4.5 days. [17:51:20] (03CR) 10Smalyshev: [C: 031] wdqs - use configuration file generated by scap [puppet] - 10https://gerrit.wikimedia.org/r/310308 (https://phabricator.wikimedia.org/T144380) (owner: 10Gehel) [17:53:19] !log reimaging mw2198 that failed early today [17:53:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [17:55:10] PROBLEM - Apache HTTP on mw2198 is CRITICAL: Connection refused [17:55:12] !log wtp2019 - down sinc a couple days. console says "Alert! System fatal error during previous boot" [17:55:16] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [17:55:33] !log wtp2019 Uncorrectable Memory Error [17:55:37] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [17:55:58] PROBLEM - salt-minion processes on mw2198 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/salt-minion [17:56:09] PROBLEM - puppet last run on mw2198 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 28 minutes ago with 1 failures. Failed resources (up to 3 shown): Service[mw-cgroup] [17:56:23] (03PS1) 10Andrew Bogott: puppet panel: Fix incorrect path [puppet] - 10https://gerrit.wikimedia.org/r/310361 [17:56:43] (03CR) 10jenkins-bot: [V: 04-1] puppet panel: Fix incorrect path [puppet] - 10https://gerrit.wikimedia.org/r/310361 (owner: 10Andrew Bogott) [17:57:54] (03PS2) 10Andrew Bogott: puppet panel: Fix incorrect path [puppet] - 10https://gerrit.wikimedia.org/r/310361 [17:58:32] !log wtp2019 - powercycled, back up without the error, services started [17:58:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [17:58:51] (03PS5) 10Paladox: Add support for searching gerrit using bug:T1 [puppet] - 10https://gerrit.wikimedia.org/r/308753 (https://phabricator.wikimedia.org/T85002) [17:59:08] RECOVERY - Host wtp2019 is UP: PING OK - Packet loss = 0%, RTA = 36.70 ms [17:59:18] (03CR) 1020after4: [C: 031] Revert "contint:firewall: let phabricator talk to gearman" [puppet] - 10https://gerrit.wikimedia.org/r/310039 (https://phabricator.wikimedia.org/T137323) (owner: 10Hashar) [17:59:24] (03PS5) 10Paladox: facilities: Fix variable contains an uppercase letter [puppet] - 10https://gerrit.wikimedia.org/r/308338 (https://phabricator.wikimedia.org/T93645) [17:59:39] (03PS3) 10Paladox: labs_dns: Fix optional parameter listed before required parameter [puppet] - 10https://gerrit.wikimedia.org/r/308340 (https://phabricator.wikimedia.org/T93645) [17:59:56] 06Operations, 10Pybal, 06Services, 15User-mobrovac: Depool / repool scripts execute successfully even when the host has not been (r|d)epooled - https://phabricator.wikimedia.org/T145518#2633866 (10mobrovac) [18:00:04] anomie, ostriches, thcipriani, hashar, and twentyafterfour: Dear anthropoid, the time has come. Please deploy Morning SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20160913T1800). [18:00:25] Oh good nothing on it since I stole scap :p [18:00:37] 06Operations, 06Operations-Software-Development: wmfpuppet Salt module deprecation warning - https://phabricator.wikimedia.org/T145544#2633870 (10Volans) [18:01:21] (03PS2) 10Dzahn: Revert "contint:firewall: let phabricator talk to gearman" [puppet] - 10https://gerrit.wikimedia.org/r/310039 (https://phabricator.wikimedia.org/T137323) (owner: 10Hashar) [18:01:44] (03PS4) 10Paladox: snapshot: Fix variable contains an uppercase letter [puppet] - 10https://gerrit.wikimedia.org/r/308355 (https://phabricator.wikimedia.org/T93645) [18:02:08] (03PS3) 10Paladox: rsync: Fix variable contains an uppercase letter [puppet] - 10https://gerrit.wikimedia.org/r/308350 (https://phabricator.wikimedia.org/T93645) [18:03:03] (03PS5) 10Paladox: aqs and restbase: Fix variable contains an uppercase letter [puppet] - 10https://gerrit.wikimedia.org/r/308328 (https://phabricator.wikimedia.org/T93645) [18:07:38] (03CR) 10Dzahn: [C: 032] Revert "contint:firewall: let phabricator talk to gearman" [puppet] - 10https://gerrit.wikimedia.org/r/310039 (https://phabricator.wikimedia.org/T137323) (owner: 10Hashar) [18:08:23] !log moving to scap deployed configuration for wdqs - T144380 [18:08:24] T144380: Install and configure new WDQS nodes on codfw - https://phabricator.wikimedia.org/T144380 [18:08:29] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [18:08:46] (03PS3) 10Gehel: wdqs - use configuration file generated by scap [puppet] - 10https://gerrit.wikimedia.org/r/310308 (https://phabricator.wikimedia.org/T144380) [18:10:08] (03CR) 10Gehel: [C: 032] wdqs - use configuration file generated by scap [puppet] - 10https://gerrit.wikimedia.org/r/310308 (https://phabricator.wikimedia.org/T144380) (owner: 10Gehel) [18:16:23] (03PS1) 10Yuvipanda: labspuppetbackend: Use _ to denote empty [puppet] - 10https://gerrit.wikimedia.org/r/310363 [18:16:28] andrewbogott: ^ [18:17:26] (03CR) 10Andrew Bogott: [C: 031] labspuppetbackend: Use _ to denote empty [puppet] - 10https://gerrit.wikimedia.org/r/310363 (owner: 10Yuvipanda) [18:17:31] PROBLEM - thumbor@8826 service on thumbor1001 is CRITICAL: CRITICAL - Expecting active but unit thumbor@8826 is inactive [18:24:12] !log demon@tin Finished scap: testwiki to wmf.19 + l10n bootstrap (try 2) (duration: 47m 21s) [18:24:18] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [18:24:28] PROBLEM - thumbor@8803 service on thumbor1002 is CRITICAL: CRITICAL - Expecting active but unit thumbor@8803 is inactive [18:26:09] PROBLEM - thumbor@8837 service on thumbor1001 is CRITICAL: CRITICAL - Expecting active but unit thumbor@8837 is inactive [18:28:08] 06Operations, 10ops-codfw, 10ops-eqiad, 10ops-esams, and 3 others: Monitor hardware thermal issues - https://phabricator.wikimedia.org/T125205#1981274 (10Dzahn) The "freeipmi" package exists in jessie (and xenial) but not in precise or trusty. Suggesting to add it in base on all jessie machines though to s... [18:35:04] !log starting data import on wdqs200? [18:35:09] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [18:37:26] (03PS1) 10Dzahn: base: install 'freeipmi', 'libipc-run-perl' on jessie [puppet] - 10https://gerrit.wikimedia.org/r/310369 (https://phabricator.wikimedia.org/T125205) [18:37:59] RECOVERY - thumbor@8826 service on thumbor1001 is OK: OK - thumbor@8826 is active [18:39:08] RECOVERY - thumbor@8837 service on thumbor1001 is OK: OK - thumbor@8837 is active [18:44:50] RECOVERY - thumbor@8803 service on thumbor1002 is OK: OK - thumbor@8803 is active [19:00:04] ostriches: Dear anthropoid, the time has come. Please deploy MediaWiki train (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20160913T1900). [19:00:26] (03CR) 10Chad: [C: 032] group0 to wmf.19 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/310354 (owner: 10Chad) [19:00:55] (03Merged) 10jenkins-bot: group0 to wmf.19 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/310354 (owner: 10Chad) [19:01:55] !log demon@tin rebuilt wikiversions.php and synchronized wikiversions files: group0 to wmf.19 [19:02:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [19:10:40] 06Operations, 10LDAP-Access-Requests, 06TCB-Team, 06WMDE-Analytics-Engineering, and 2 others: Update wmde LDAP group - https://phabricator.wikimedia.org/T145384#2634313 (10hashar) From a quick look at the doc, roots and members of the `ldap-admins` unix group can do the the LDAP modifications. [19:16:18] PROBLEM - thumbor@8806 service on thumbor1001 is CRITICAL: CRITICAL - Expecting active but unit thumbor@8806 is inactive [19:16:24] (03PS1) 10Dzahn: monitoring: add check_ipmi_sensor plugin [puppet] - 10https://gerrit.wikimedia.org/r/310379 (https://phabricator.wikimedia.org/T125205) [19:35:13] PROBLEM - check_puppetrun on thulium is CRITICAL: CRITICAL: Puppet has 1 failures [19:35:19] PROBLEM - thumbor@8806 service on thumbor1002 is CRITICAL: CRITICAL - Expecting active but unit thumbor@8806 is inactive [19:36:18] PROBLEM - thumbor@8813 service on thumbor1002 is CRITICAL: CRITICAL - Expecting active but unit thumbor@8813 is inactive [19:37:09] RECOVERY - thumbor@8806 service on thumbor1001 is OK: OK - thumbor@8806 is active [19:39:47] -spb- [Global notice] We've put a cloth over the self-destruct button so that nobody pushes it again. [19:39:54] -spb- [Global notice] Apologies for the noise there. We don't seem to be under attack, and should be back to normal now... [19:39:55] LOl [19:40:09] RECOVERY - check_puppetrun on thulium is OK: OK: Puppet is currently enabled, last run 76 seconds ago with 0 failures [19:45:38] RECOVERY - thumbor@8806 service on thumbor1002 is OK: OK - thumbor@8806 is active [19:46:39] RECOVERY - thumbor@8813 service on thumbor1002 is OK: OK - thumbor@8813 is active [19:49:13] (03PS1) 10Dzahn: monitoring/base: add NRPE command to check temperature [puppet] - 10https://gerrit.wikimedia.org/r/310383 (https://phabricator.wikimedia.org/T125205) [19:50:38] (03CR) 10jenkins-bot: [V: 04-1] monitoring/base: add NRPE command to check temperature [puppet] - 10https://gerrit.wikimedia.org/r/310383 (https://phabricator.wikimedia.org/T125205) (owner: 10Dzahn) [19:53:07] (03PS2) 10Dzahn: monitoring/base: add NRPE command to check temperature [puppet] - 10https://gerrit.wikimedia.org/r/310383 (https://phabricator.wikimedia.org/T125205) [19:53:31] (03CR) 10Paladox: [C: 031] monitoring/base: add NRPE command to check temperature [puppet] - 10https://gerrit.wikimedia.org/r/310383 (https://phabricator.wikimedia.org/T125205) (owner: 10Dzahn) [19:53:50] PROBLEM - thumbor@8836 service on thumbor1002 is CRITICAL: CRITICAL - Expecting active but unit thumbor@8836 is inactive [19:58:02] PROBLEM - thumbor@8827 service on thumbor1002 is CRITICAL: CRITICAL - Expecting active but unit thumbor@8827 is inactive [20:04:12] ostriches: Could I get https://gerrit.wikimedia.org/r/#/c/310385/ into wmf.19 before it goes to any wikis please? [20:04:27] It reverts code that's going to do scary things to global memc keys [20:05:55] Oh, ugh, group0 hit an hour ago [20:06:55] I'll just put it in SWAT then [20:07:23] Ok who thought there were so many swearwards on a wiktionary https://en.wiktionary.org/wiki/Category:English_vulgarities [20:07:25] LOL [20:09:47] paladox: yep, they all belong there like any other word, also slang and everything [20:09:56] LOL [20:10:31] i have been translating a bunch of the German ones [20:10:47] * bawolff thinks that clearly paladox has not found the more darker sides of commons... [20:11:24] bawolff LOL, what theres a darker side [20:11:39] oh noes [20:13:52] RECOVERY - thumbor@8827 service on thumbor1002 is OK: OK - thumbor@8827 is active [20:14:53] RECOVERY - thumbor@8836 service on thumbor1002 is OK: OK - thumbor@8836 is active [20:18:34] (03CR) 10Dzahn: [C: 031] quarry: migrate classes to autoloader layout [puppet] - 10https://gerrit.wikimedia.org/r/308313 (https://phabricator.wikimedia.org/T93645) (owner: 10Hashar) [20:20:54] (03CR) 10Dzahn: "i'd like to see it merged, if Yuvipanda is fine with it" [puppet] - 10https://gerrit.wikimedia.org/r/308313 (https://phabricator.wikimedia.org/T93645) (owner: 10Hashar) [20:21:43] RoanKattouw: The train was at noon, you missed it by a mile.... [20:21:51] !log restart rabbitmq-server on labtestcontrol2001 [20:21:57] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [20:22:00] doesn't seem to have fixed anything yet :/ [20:22:02] (03PS2) 10Madhuvishy: dynamicproxy: Override nginx worker_connections default [puppet] - 10https://gerrit.wikimedia.org/r/309450 (https://phabricator.wikimedia.org/T143637) [20:22:30] RoanKattouw: We can do it right now if you want tho [20:23:04] aha [20:23:25] (03PS6) 10Dzahn: facilities: Fix variable contains an uppercase letter [puppet] - 10https://gerrit.wikimedia.org/r/308338 (https://phabricator.wikimedia.org/T93645) (owner: 10Paladox) [20:23:34] (03CR) 10Dzahn: [C: 032] facilities: Fix variable contains an uppercase letter [puppet] - 10https://gerrit.wikimedia.org/r/308338 (https://phabricator.wikimedia.org/T93645) (owner: 10Paladox) [20:23:35] !log restarted designate-api on labtestservices2001, now designate in labtest is working again [20:23:42] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [20:26:14] (03CR) 10Dzahn: [C: 031] labs_dns: Fix optional parameter listed before required parameter [puppet] - 10https://gerrit.wikimedia.org/r/308340 (https://phabricator.wikimedia.org/T93645) (owner: 10Paladox) [20:29:31] (03CR) 10Dzahn: [C: 031] snapshot: Fix variable contains an uppercase letter [puppet] - 10https://gerrit.wikimedia.org/r/308355 (https://phabricator.wikimedia.org/T93645) (owner: 10Paladox) [20:31:24] (03CR) 10Dzahn: [C: 031] rsync: Fix variable contains an uppercase letter [puppet] - 10https://gerrit.wikimedia.org/r/308350 (https://phabricator.wikimedia.org/T93645) (owner: 10Paladox) [20:32:31] (03CR) 10Dzahn: [C: 032] "comes from puppetlabs-rsync: https://github.com/puppetlabs/puppetlabs-rsync" [puppet] - 10https://gerrit.wikimedia.org/r/308350 (https://phabricator.wikimedia.org/T93645) (owner: 10Paladox) [20:32:46] (03PS4) 10Dzahn: rsync: Fix variable contains an uppercase letter [puppet] - 10https://gerrit.wikimedia.org/r/308350 (https://phabricator.wikimedia.org/T93645) (owner: 10Paladox) [20:34:22] ostriches: Yes please [20:38:10] RoanKattouw: merging [20:38:58] (03CR) 10Dzahn: "i don't know what this does" [puppet] - 10https://gerrit.wikimedia.org/r/308753 (https://phabricator.wikimedia.org/T85002) (owner: 10Paladox) [20:40:34] (03CR) 10Paladox: "What this does is it supports searching for example T1, bug:T1, since we support bug:1566 and soo on but doint support searching for tasks" [puppet] - 10https://gerrit.wikimedia.org/r/308753 (https://phabricator.wikimedia.org/T85002) (owner: 10Paladox) [20:40:41] (03PS6) 10Paladox: Add support for searching gerrit using bug:T1 [puppet] - 10https://gerrit.wikimedia.org/r/308753 (https://phabricator.wikimedia.org/T85002) [20:40:44] (03CR) 10Dzahn: "alright, thanks 20after4" [puppet] - 10https://gerrit.wikimedia.org/r/281071 (https://phabricator.wikimedia.org/T131622) (owner: 10Paladox) [20:41:15] (03CR) 10Dzahn: [C: 032] Set differential.allow-self-accept to true in phabricator [puppet] - 10https://gerrit.wikimedia.org/r/281071 (https://phabricator.wikimedia.org/T131622) (owner: 10Paladox) [20:41:37] thanks ^^ mutante :) [20:42:08] (03CR) 10Dzahn: [C: 031] "withdrew my -2, there is a dependency on this one though" [puppet] - 10https://gerrit.wikimedia.org/r/281071 (https://phabricator.wikimedia.org/T131622) (owner: 10Paladox) [20:42:26] paladox: "submit incl. parents" etc... [20:42:55] paladox: can you break that dependency maybe, i'll get back to it later [20:43:06] Ok [20:43:10] afk, and bbl [20:43:32] ok [20:45:12] (03PS6) 10Paladox: Set differential.allow-self-accept to true in phabricator [puppet] - 10https://gerrit.wikimedia.org/r/281071 (https://phabricator.wikimedia.org/T131622) [20:45:25] (03Draft2) 10Paladox: Set differential.allow-self-accept to true in phabricator [puppet] - 10https://gerrit.wikimedia.org/r/310394 (https://phabricator.wikimedia.org/T131622) [20:45:39] (03Abandoned) 10Paladox: Set differential.allow-self-accept to true in phabricator [puppet] - 10https://gerrit.wikimedia.org/r/281071 (https://phabricator.wikimedia.org/T131622) (owner: 10Paladox) [20:45:46] (03Draft1) 10Paladox: Set differential.allow-self-accept to true in phabricator [puppet] - 10https://gerrit.wikimedia.org/r/310394 (https://phabricator.wikimedia.org/T131622) [20:46:17] mutante i did it in https://gerrit.wikimedia.org/r/#/c/310394/ instead [20:46:18] :) [20:46:25] !log demon@tin Synchronized php-1.28.0-wmf.19/extensions/Echo: For Roan <3 (duration: 00m 54s) [20:46:30] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [20:46:31] RoanKattouw: ^^ [20:47:55] (03CR) 10Yuvipanda: [C: 04-1] dynamicproxy: Override nginx worker_connections default (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/309450 (https://phabricator.wikimedia.org/T143637) (owner: 10Madhuvishy) [20:49:19] (03PS3) 10Madhuvishy: dynamicproxy: Override nginx worker_connections default [puppet] - 10https://gerrit.wikimedia.org/r/309450 (https://phabricator.wikimedia.org/T143637) [20:50:32] (03PS4) 10Madhuvishy: dynamicproxy: Override nginx worker_connections default [puppet] - 10https://gerrit.wikimedia.org/r/309450 (https://phabricator.wikimedia.org/T143637) [20:50:49] (03PS5) 10Madhuvishy: dynamicproxy: Override nginx worker_connections default [puppet] - 10https://gerrit.wikimedia.org/r/309450 (https://phabricator.wikimedia.org/T143637) [20:51:57] (03CR) 10Yuvipanda: [C: 031] dynamicproxy: Override nginx worker_connections default [puppet] - 10https://gerrit.wikimedia.org/r/309450 (https://phabricator.wikimedia.org/T143637) (owner: 10Madhuvishy) [20:52:15] (03CR) 10Madhuvishy: [C: 032] dynamicproxy: Override nginx worker_connections default [puppet] - 10https://gerrit.wikimedia.org/r/309450 (https://phabricator.wikimedia.org/T143637) (owner: 10Madhuvishy) [20:59:37] Thanks ostriches ! [20:59:46] yw [21:08:06] (03CR) 10Hashar: "Gone from both gallium and contint1001. Thank you!" [puppet] - 10https://gerrit.wikimedia.org/r/310039 (https://phabricator.wikimedia.org/T137323) (owner: 10Hashar) [21:19:10] (03CR) 10Hashar: [C: 031] base: install 'freeipmi', 'libipc-run-perl' on jessie [puppet] - 10https://gerrit.wikimedia.org/r/310369 (https://phabricator.wikimedia.org/T125205) (owner: 10Dzahn) [21:36:38] (03PS2) 10Yuvipanda: labspuppetbackend: Use _ to denote empty [puppet] - 10https://gerrit.wikimedia.org/r/310363 [21:36:46] (03CR) 10Yuvipanda: [C: 032 V: 032] labspuppetbackend: Use _ to denote empty [puppet] - 10https://gerrit.wikimedia.org/r/310363 (owner: 10Yuvipanda) [21:43:40] 06Operations, 10Mail, 10OTRS, 10Wiki-Loves-Monuments: E-mails not being received by OTRS - https://phabricator.wikimedia.org/T145293#2625771 (10faidon) Regarding info-nl@wikilovesmonuments.eu: this domain has no MX record, its A does not respond to port 25 and its AAAA points to an unreachable address. Ema... [21:45:15] !log demon@tin Synchronized php-1.28.0-wmf.19/includes/DefaultSettings.php: for lego <3 (duration: 00m 47s) [21:45:20] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [21:45:21] legoktm: ^^^ [21:45:24] <3 [21:45:27] thank you! [21:45:55] yw [22:02:26] (03CR) 10Mobrovac: "Are variables not supposed to contain uppercase letters?" [puppet] - 10https://gerrit.wikimedia.org/r/308328 (https://phabricator.wikimedia.org/T93645) (owner: 10Paladox) [22:02:53] (03CR) 10Paladox: "Yep since puppet-lint keeps complaining about that." [puppet] - 10https://gerrit.wikimedia.org/r/308328 (https://phabricator.wikimedia.org/T93645) (owner: 10Paladox) [22:08:20] (03CR) 10Mobrovac: [C: 04-1] aqs and restbase: Fix variable contains an uppercase letter (036 comments) [puppet] - 10https://gerrit.wikimedia.org/r/308328 (https://phabricator.wikimedia.org/T93645) (owner: 10Paladox) [22:09:58] PROBLEM - puppet last run on es2001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [22:12:30] PROBLEM - puppet last run on cp3004 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [22:13:08] !log demon@tin Synchronized php-1.28.0-wmf.19/includes/MediaWiki.php: Avoid stupid warnings on url parsing (duration: 00m 47s) [22:13:13] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [22:17:21] PROBLEM - puppet last run on elastic2019 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [22:34:37] (03PS1) 10Mobrovac: Conftool: Create script that checks the state after (de)pooling [puppet] - 10https://gerrit.wikimedia.org/r/310454 (https://phabricator.wikimedia.org/T145518) [22:35:20] RECOVERY - puppet last run on es2001 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [22:35:49] (03CR) 10jenkins-bot: [V: 04-1] Conftool: Create script that checks the state after (de)pooling [puppet] - 10https://gerrit.wikimedia.org/r/310454 (https://phabricator.wikimedia.org/T145518) (owner: 10Mobrovac) [22:37:00] (03PS2) 10Mobrovac: Conftool: Create script that checks the state after (de)pooling [puppet] - 10https://gerrit.wikimedia.org/r/310454 (https://phabricator.wikimedia.org/T145518) [22:37:59] RECOVERY - puppet last run on cp3004 is OK: OK: Puppet is currently enabled, last run 51 seconds ago with 0 failures [22:41:05] (03PS1) 10ArielGlenn: fix up the find page in last 500 lines "temporary" hack [dumps] - 10https://gerrit.wikimedia.org/r/310456 [22:41:21] (03CR) 10jenkins-bot: [V: 04-1] fix up the find page in last 500 lines "temporary" hack [dumps] - 10https://gerrit.wikimedia.org/r/310456 (owner: 10ArielGlenn) [22:42:24] (03PS2) 10ArielGlenn: fix up the find page in last 500 lines "temporary" hack [dumps] - 10https://gerrit.wikimedia.org/r/310456 [22:42:44] RECOVERY - puppet last run on elastic2019 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [22:43:38] where do I do puppet-merge these days I wonder [22:43:43] still palladium? [22:43:57] oh but this is scap. so I can avoid the issue, heh [22:44:40] (03CR) 10ArielGlenn: [C: 032] fix up the find page in last 500 lines "temporary" hack [dumps] - 10https://gerrit.wikimedia.org/r/310456 (owner: 10ArielGlenn) [22:49:48] gwicke: I added one line to the doc [22:54:42] (03PS3) 10Mobrovac: Conftool: Create script that checks the state after (de)pooling [puppet] - 10https://gerrit.wikimedia.org/r/310454 (https://phabricator.wikimedia.org/T145518) [22:55:01] (03PS3) 10Andrew Bogott: puppet panel: Fix incorrect path [puppet] - 10https://gerrit.wikimedia.org/r/310361 [22:56:44] (03CR) 10Andrew Bogott: [C: 032] puppet panel: Fix incorrect path [puppet] - 10https://gerrit.wikimedia.org/r/310361 (owner: 10Andrew Bogott) [23:00:05] RoanKattouw, ostriches, MaxSem, and Dereckson: Dear anthropoid, the time has come. Please deploy Evening SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20160913T2300). [23:00:05] RoanKattouw: A patch you scheduled for Evening SWAT (Max 8 patches) is about to be deployed. Please be available during the process. [23:01:34] * RoanKattouw waves [23:02:03] * RoanKattouw decides to do his own SWAT [23:15:16] Hi. [23:15:41] RoanKattouw: so your laptop backup has been recovered now? :) [23:15:49] Yes :) [23:17:59] (03PS1) 10Alex Monk: conftool: get conf from class parameters [puppet] - 10https://gerrit.wikimedia.org/r/310459 [23:19:49] !log catrope@tin Synchronized php-1.28.0-wmf.19/resources/lib/moment/locale: T145382 (duration: 00m 49s) [23:19:50] T145382: "TypeError: moment is undefined" when using Echo - https://phabricator.wikimedia.org/T145382 [23:19:55] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [23:32:09] !log catrope@tin Synchronized php-1.28.0-wmf.18/resources/lib/moment/locale: T145382 (duration: 00m 47s) [23:32:11] T145382: "TypeError: moment is undefined" when using Echo - https://phabricator.wikimedia.org/T145382 [23:32:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master