[00:07:12] 06Operations, 10Deployment-Systems, 03Scap3 (Scap3-MediaWiki-MVP): Completely port l10nupdate to scap - https://phabricator.wikimedia.org/T133913#2420519 (10greg) [00:20:38] !log maxsem@tin Finished scap: https://gerrit.wikimedia.org/r/#/c/296819/ - noop in prod (duration: 27m 27s) [00:20:43] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [00:21:21] kaldari, ^ [00:22:09] looking [00:22:38] everything looks kosher [00:22:44] thanks! [01:04:24] !log ori@tin Synchronized php-1.28.0-wmf.8/extensions/WikimediaEvents/modules/ext.wikimediaEvents.deprecate.js: Ie28d823c: Log ResourceLoader URL-splitting (duration: 00m 32s) [01:04:29] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [01:04:49] (03CR) 10Ori.livneh: [C: 032] HD logo for en.wiktionary [mediawiki-config] - 10https://gerrit.wikimedia.org/r/296851 (https://phabricator.wikimedia.org/T138801) (owner: 10Dereckson) [01:05:25] (03Merged) 10jenkins-bot: HD logo for en.wiktionary [mediawiki-config] - 10https://gerrit.wikimedia.org/r/296851 (https://phabricator.wikimedia.org/T138801) (owner: 10Dereckson) [01:07:34] !log ori@tin Synchronized static/images/project-logos: Ie8a71af5: HD logo for en.wiktionary (1/2) (duration: 00m 28s) [01:07:39] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [01:08:20] !log ori@tin Synchronized wmf-config/InitialiseSettings.php: Ie8a71af5: HD logo for en.wiktionary (2/2) (duration: 00m 27s) [01:08:24] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [01:44:13] (03PS1) 10Aaron Schulz: Enable LocalFile log [mediawiki-config] - 10https://gerrit.wikimedia.org/r/296861 [01:44:28] (03CR) 10Aaron Schulz: [C: 032] Enable LocalFile log [mediawiki-config] - 10https://gerrit.wikimedia.org/r/296861 (owner: 10Aaron Schulz) [01:45:03] (03Merged) 10jenkins-bot: Enable LocalFile log [mediawiki-config] - 10https://gerrit.wikimedia.org/r/296861 (owner: 10Aaron Schulz) [01:45:38] (03PS1) 10Ori.livneh: Bump $wgResourceLoaderStorageEnabled to 3,000 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/296862 [01:46:02] (03PS2) 10Ori.livneh: Bump $wgResourceLoaderStorageEnabled to 3,000 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/296862 [01:46:13] (03CR) 10Ori.livneh: [C: 032] Bump $wgResourceLoaderStorageEnabled to 3,000 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/296862 (owner: 10Ori.livneh) [01:46:25] PROBLEM - mobileapps endpoints health on scb1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [01:46:34] (03CR) 10Ori.livneh: [C: 04-2] Bump $wgResourceLoaderStorageEnabled to 3,000 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/296862 (owner: 10Ori.livneh) [01:46:50] !log aaron@tin Synchronized wmf-config/InitialiseSettings.php: Enable LocalFile log (duration: 00m 32s) [01:46:56] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [01:46:57] (03PS3) 10Ori.livneh: Bump $wgResourceLoaderMaxQueryLength to 3,000 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/296862 [01:47:52] bearND: ^ see alert above for mobileapps [01:48:07] (03CR) 10Ori.livneh: [C: 032] Bump $wgResourceLoaderMaxQueryLength to 3,000 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/296862 (owner: 10Ori.livneh) [01:48:35] RECOVERY - mobileapps endpoints health on scb1001 is OK: All endpoints are healthy [01:48:45] (03Merged) 10jenkins-bot: Bump $wgResourceLoaderMaxQueryLength to 3,000 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/296862 (owner: 10Ori.livneh) [01:59:33] !log ori@tin Synchronized wmf-config/CommonSettings.php: I3a8057a8: Bump $wgResourceLoaderMaxQueryLength to 3,000 (duration: 00m 28s) [01:59:38] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [02:08:10] (03CR) 10BBlack: [C: 04-1] Add Content-Security-Policy to images from test[2]wiki (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/296634 (https://phabricator.wikimedia.org/T117618) (owner: 10Brian Wolff) [02:12:50] 06Operations, 10Traffic, 13Patch-For-Review: Decrease max object TTL in varnishes - https://phabricator.wikimedia.org/T124954#2420580 (10BBlack) @Krinkle - I think 14d for the maximum s-maxage MW advertises to Varnish is fine for now. We'd obviously like to, in the long run, get the effective lifetimes even... [02:19:25] RECOVERY - Router interfaces on cr2-knams is OK: OK: host 91.198.174.246, interfaces up: 57, down: 0, dormant: 0, excluded: 0, unused: 0 [02:38:54] !log mwdeploy@tin scap sync-l10n completed (1.28.0-wmf.8) (duration: 17m 02s) [02:39:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [02:44:01] !log l10nupdate@tin ResourceLoader cache refresh completed at Fri Jul 1 02:44:01 UTC 2016 (duration 5m 7s) [02:44:07] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [03:49:00] PROBLEM - Hadoop DataNode on analytics1034 is CRITICAL: PROCS CRITICAL: 0 processes with command name java, args org.apache.hadoop.hdfs.server.datanode.DataNode [03:57:59] PROBLEM - HP RAID on ms-be2022 is CRITICAL: CHECK_NRPE: Socket timeout after 40 seconds. [04:00:00] RECOVERY - HP RAID on ms-be2022 is OK: OK: Slot 3: OK: 2I:4:1, 2I:4:2, 1I:1:5, 1I:1:6, 1I:1:7, 1I:1:8, 1I:1:1, 1I:1:2, 1I:1:3, 1I:1:4, 2I:2:1, 2I:2:2, 2I:2:3, 2I:2:4, Controller, Battery/Capacitor [04:00:10] RECOVERY - Hadoop DataNode on analytics1034 is OK: PROCS OK: 1 process with command name java, args org.apache.hadoop.hdfs.server.datanode.DataNode [04:15:26] PROBLEM - mobileapps endpoints health on scb1002 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [04:17:36] RECOVERY - mobileapps endpoints health on scb1002 is OK: All endpoints are healthy [04:39:06] PROBLEM - Hadoop DataNode on analytics1033 is CRITICAL: PROCS CRITICAL: 0 processes with command name java, args org.apache.hadoop.hdfs.server.datanode.DataNode [04:43:36] RECOVERY - Hadoop DataNode on analytics1033 is OK: PROCS OK: 1 process with command name java, args org.apache.hadoop.hdfs.server.datanode.DataNode [05:36:52] fyi we are working on --^, it seems that the HDFS datanode java process consumes all the available Heap getting JavaOOM exceptions (-Xmx 1GB for the moment) [05:38:46] PROBLEM - puppet last run on mw2066 is CRITICAL: CRITICAL: puppet fail [05:41:37] PROBLEM - puppet last run on mw2173 is CRITICAL: CRITICAL: Puppet has 1 failures [05:46:47] PROBLEM - mobileapps endpoints health on scb1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [05:48:57] RECOVERY - mobileapps endpoints health on scb1001 is OK: All endpoints are healthy [06:05:37] RECOVERY - puppet last run on mw2066 is OK: OK: Puppet is currently enabled, last run 3 seconds ago with 0 failures [06:06:17] RECOVERY - puppet last run on mw2173 is OK: OK: Puppet is currently enabled, last run 26 seconds ago with 0 failures [06:18:29] !log rolling reboot of wtp1* for kernel security update [06:18:35] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [06:28:49] !log resuming rolling reboots of elastic* clusters in eqiad and codfw [06:28:55] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [06:30:38] PROBLEM - puppet last run on neodymium is CRITICAL: CRITICAL: puppet fail [06:31:36] PROBLEM - puppet last run on cp2013 is CRITICAL: CRITICAL: Puppet has 1 failures [06:31:56] PROBLEM - puppet last run on db2055 is CRITICAL: CRITICAL: puppet fail [06:32:24] PROBLEM - puppet last run on mw2158 is CRITICAL: CRITICAL: Puppet has 2 failures [06:32:36] PROBLEM - puppet last run on mw1135 is CRITICAL: CRITICAL: Puppet has 2 failures [06:32:45] PROBLEM - puppet last run on mw2207 is CRITICAL: CRITICAL: Puppet has 1 failures [06:33:04] PROBLEM - puppet last run on wtp2008 is CRITICAL: CRITICAL: Puppet has 1 failures [06:34:54] PROBLEM - puppet last run on wtp2017 is CRITICAL: CRITICAL: Puppet has 1 failures [06:35:06] PROBLEM - puppet last run on silver is CRITICAL: CRITICAL: Puppet has 1 failures [06:47:35] PROBLEM - puppet last run on wtp2005 is CRITICAL: CRITICAL: puppet fail [06:56:35] RECOVERY - puppet last run on mw2207 is OK: OK: Puppet is currently enabled, last run 17 seconds ago with 0 failures [06:56:45] RECOVERY - puppet last run on neodymium is OK: OK: Puppet is currently enabled, last run 4 seconds ago with 0 failures [06:57:05] RECOVERY - puppet last run on wtp2008 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:57:15] RECOVERY - puppet last run on db2055 is OK: OK: Puppet is currently enabled, last run 35 seconds ago with 0 failures [06:57:25] RECOVERY - puppet last run on wtp2017 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:58:05] RECOVERY - puppet last run on mw2158 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:58:44] RECOVERY - puppet last run on cp2013 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:58:45] RECOVERY - puppet last run on mw1135 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [07:00:04] RECOVERY - puppet last run on silver is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [07:02:30] PROBLEM - puppet last run on analytics1047 is CRITICAL: CRITICAL: Puppet has 1 failures [07:07:20] PROBLEM - puppet last run on db1039 is CRITICAL: CRITICAL: Puppet has 1 failures [07:15:20] RECOVERY - puppet last run on wtp2005 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [07:25:49] RECOVERY - puppet last run on analytics1047 is OK: OK: Puppet is currently enabled, last run 12 seconds ago with 0 failures [07:28:47] (03CR) 10Muehlenhoff: Introduce $::networks::constants::networks (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/296375 (owner: 10Alexandros Kosiaris) [07:30:01] RECOVERY - puppet last run on db1039 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [07:30:16] (03CR) 10Muehlenhoff: [C: 031] Introduce $::networks::constants::labs_networks [puppet] - 10https://gerrit.wikimedia.org/r/296376 (owner: 10Alexandros Kosiaris) [08:22:19] (03CR) 10Hashar: "Zeljko: we will come back to it soonish. We apparently found a good way to enhance the tests, I got patches for the 'network' and 'postgr" [puppet] - 10https://gerrit.wikimedia.org/r/178810 (https://phabricator.wikimedia.org/T78342) (owner: 10Hashar) [08:27:52] good morning [08:32:33] !log rolling reboot of swift frontends in codfw [08:32:37] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [08:41:56] !log powercycling ms-fe2001, stuck after reboot [08:42:01] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [08:48:35] moritzm: off to a good start :( [08:48:49] heheh [08:49:04] it does take a bit of time to reboot but usually it is fine, e.g. at grub [08:58:25] (03PS1) 10Yuvipanda: Set HOME and cwd explicitly [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/296867 [08:58:36] (03PS2) 10Yuvipanda: Set HOME and cwd explicitly [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/296867 [08:58:45] (03CR) 10jenkins-bot: [V: 04-1] Set HOME and cwd explicitly [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/296867 (owner: 10Yuvipanda) [09:03:03] !log powercycling ms-fe2002, stuck after reboot [09:03:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [09:15:41] !log powercycling ms-fe2003, stuck after reboot [09:15:45] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [09:24:13] godog: good morning :) can we attempt to get the arcanist package build for Trusty? [09:24:30] though I am not on my linux build host right now so can't really build it myself to verify it will works :( [09:24:40] ( https://phabricator.wikimedia.org/T137770 ) [09:25:28] hashar: yeah, I'm about to jump in a meeting, though a labs host with package_builder role would work as well if you don't have your linux box [09:26:17] godog: ti would hopefully just work, bunch of it is just plain php with the xhast binary having loose deps [09:26:46] (03PS1) 10Yuvipanda: Mount all of /data/project [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/296868 [09:27:05] (03CR) 10jenkins-bot: [V: 04-1] Mount all of /data/project [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/296868 (owner: 10Yuvipanda) [09:27:33] (03CR) 10Alexandros Kosiaris: Introduce $::networks::constants::networks (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/296375 (owner: 10Alexandros Kosiaris) [09:27:39] godog: lets check post lunch so ;) [09:28:16] hashar: ok! [09:29:19] akosiaris: I went berserk yesterday with puppet/rspec. I have found out puppetlabs_spec_helper has all the logic to setup/teardown the fixtures ! That let us remove a lot of code from the Rakefile from multiple modules. An example for network module is https://gerrit.wikimedia.org/r/#/c/296821/ [09:29:59] akosiaris: and I have migrated the postgresql one as well https://gerrit.wikimedia.org/r/#/c/296829/ . The trick is to list the modules in a .fixtures.yml file and puppetlabs_spec_helper process from there [09:30:14] both run fine on my machine with bundle install && bundle exec rake spec [09:35:21] (03CR) 10Muehlenhoff: Introduce $::networks::constants::networks (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/296375 (owner: 10Alexandros Kosiaris) [09:35:43] 07Blocked-on-Operations, 06Labs, 10Labs-Infrastructure, 10Monitoring: Provide a grafana installation for labs - https://phabricator.wikimedia.org/T137216#2420874 (10akosiaris) [09:35:58] !log rolling reboot of swift backends in codfw [09:36:00] (03PS2) 10Yuvipanda: Mount all of /data/project [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/296868 [09:36:02] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [09:38:03] !log rebooted eventlog2001.codfw.wmnet for kernel upgrades [09:38:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [09:40:03] (03CR) 10Alexandros Kosiaris: [C: 032] remove misc/monitoring.pp, remove import 'misc/*.pp' [puppet] - 10https://gerrit.wikimedia.org/r/296817 (owner: 10Dzahn) [09:40:09] (03PS8) 10Alexandros Kosiaris: remove misc/monitoring.pp, remove import 'misc/*.pp' [puppet] - 10https://gerrit.wikimedia.org/r/296817 (owner: 10Dzahn) [09:40:32] hashar: yeah I noticed. Thanks!!! It looks really promising [09:41:05] (03PS1) 10Hashar: bacula: mysql_bpipe_spec requires processorcount [puppet] - 10https://gerrit.wikimedia.org/r/296869 [09:41:49] akosiaris: after three years, I think I finally managed to catch up with rspec-puppet :-} [09:43:10] (03CR) 10Alexandros Kosiaris: [V: 032] remove misc/monitoring.pp, remove import 'misc/*.pp' [puppet] - 10https://gerrit.wikimedia.org/r/296817 (owner: 10Dzahn) [09:47:40] (03PS4) 10Muehlenhoff: Add ferm rules (and role) for pybal-test [puppet] - 10https://gerrit.wikimedia.org/r/294058 [09:47:56] (03CR) 10Alexandros Kosiaris: "works fine on uranium. Thank you!" [puppet] - 10https://gerrit.wikimedia.org/r/296817 (owner: 10Dzahn) [09:50:09] (03PS3) 10Yuvipanda: Set HOME and cwd explicitly [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/296867 [09:50:11] (03PS3) 10Yuvipanda: Mount all of /data/project [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/296868 [09:53:09] (03CR) 10Gehel: [C: 031] "This work fine for me locally. I love changes that remove more lines than they add! Separating dependencies in a .fixture file is so much " [puppet] - 10https://gerrit.wikimedia.org/r/296821 (https://phabricator.wikimedia.org/T78342) (owner: 10Hashar) [09:58:36] (03PS5) 10Muehlenhoff: Add ferm rules (and role) for pybal-test [puppet] - 10https://gerrit.wikimedia.org/r/294058 [10:05:53] (03CR) 10Muehlenhoff: [C: 032 V: 032] Add ferm rules (and role) for pybal-test [puppet] - 10https://gerrit.wikimedia.org/r/294058 (owner: 10Muehlenhoff) [10:06:54] (03PS1) 10Hashar: base: fix various spec issues [puppet] - 10https://gerrit.wikimedia.org/r/296871 [10:07:22] (03CR) 10Alexandros Kosiaris: [C: 031] syslog: limit source range to $PRODUCTION_NETWORKS [puppet] - 10https://gerrit.wikimedia.org/r/295368 (owner: 10Filippo Giunchedi) [10:07:51] (03CR) 10Hashar: "Yet another spec fix :-}" [puppet] - 10https://gerrit.wikimedia.org/r/296871 (owner: 10Hashar) [10:09:14] (03CR) 10Hashar: "Removal of "remove import 'misc/*.pp'" from site.pp is definitely an achievement!! Kudos." [puppet] - 10https://gerrit.wikimedia.org/r/296817 (owner: 10Dzahn) [10:11:03] (03PS1) 10Muehlenhoff: Enable base::firewall for pybal-test [puppet] - 10https://gerrit.wikimedia.org/r/296872 [10:16:11] (03CR) 10Alexandros Kosiaris: [C: 04-1] Check all ores web nodes (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/296535 (https://phabricator.wikimedia.org/T134782) (owner: 10Nschaaf) [10:17:51] (03CR) 10Alexandros Kosiaris: [C: 032] network: simplify tests with puppetlabs_spec_helper [puppet] - 10https://gerrit.wikimedia.org/r/296821 (https://phabricator.wikimedia.org/T78342) (owner: 10Hashar) [10:17:57] (03PS3) 10Alexandros Kosiaris: network: simplify tests with puppetlabs_spec_helper [puppet] - 10https://gerrit.wikimedia.org/r/296821 (https://phabricator.wikimedia.org/T78342) (owner: 10Hashar) [10:18:08] (03CR) 10Alexandros Kosiaris: [V: 032] network: simplify tests with puppetlabs_spec_helper [puppet] - 10https://gerrit.wikimedia.org/r/296821 (https://phabricator.wikimedia.org/T78342) (owner: 10Hashar) [10:18:48] (03PS9) 10Nschaaf: Check all ores web nodes [puppet] - 10https://gerrit.wikimedia.org/r/296535 (https://phabricator.wikimedia.org/T134782) [10:22:19] (03CR) 10Muehlenhoff: [C: 032 V: 032] Enable base::firewall for pybal-test [puppet] - 10https://gerrit.wikimedia.org/r/296872 (owner: 10Muehlenhoff) [10:22:46] (03PS2) 10Muehlenhoff: Enable base::firewall for pybal-test [puppet] - 10https://gerrit.wikimedia.org/r/296872 [10:22:53] (03CR) 10Muehlenhoff: [V: 032] Enable base::firewall for pybal-test [puppet] - 10https://gerrit.wikimedia.org/r/296872 (owner: 10Muehlenhoff) [10:22:55] (03PS1) 10Yuvipanda: Bump deb version [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/296874 [10:29:29] (03CR) 10Alexandros Kosiaris: [C: 032] postgresql: simplify tests with puppetlabs_spec_helper [puppet] - 10https://gerrit.wikimedia.org/r/296829 (owner: 10Hashar) [10:29:34] (03PS2) 10Alexandros Kosiaris: postgresql: simplify tests with puppetlabs_spec_helper [puppet] - 10https://gerrit.wikimedia.org/r/296829 (owner: 10Hashar) [10:29:39] (03CR) 10Alexandros Kosiaris: [V: 032] postgresql: simplify tests with puppetlabs_spec_helper [puppet] - 10https://gerrit.wikimedia.org/r/296829 (owner: 10Hashar) [10:31:28] (03CR) 10Yuvipanda: [C: 032] Set HOME and cwd explicitly [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/296867 (owner: 10Yuvipanda) [10:31:38] (03CR) 10Yuvipanda: [C: 032] Mount all of /data/project [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/296868 (owner: 10Yuvipanda) [10:31:46] (03CR) 10Yuvipanda: [C: 032] Bump deb version [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/296874 (owner: 10Yuvipanda) [10:32:03] (03CR) 10Alexandros Kosiaris: [C: 032] bacula: mysql_bpipe_spec requires processorcount [puppet] - 10https://gerrit.wikimedia.org/r/296869 (owner: 10Hashar) [10:32:08] (03PS2) 10Alexandros Kosiaris: bacula: mysql_bpipe_spec requires processorcount [puppet] - 10https://gerrit.wikimedia.org/r/296869 (owner: 10Hashar) [10:32:13] (03CR) 10Alexandros Kosiaris: [V: 032] bacula: mysql_bpipe_spec requires processorcount [puppet] - 10https://gerrit.wikimedia.org/r/296869 (owner: 10Hashar) [10:32:43] (03CR) 10Alexandros Kosiaris: [C: 032] base: fix various spec issues [puppet] - 10https://gerrit.wikimedia.org/r/296871 (owner: 10Hashar) [10:33:22] akosiaris: gehel funny thing is the puppetlabs_spec_helper comes with rake tasks to run puppet validate , ruby syntax check and puppet-listing ;-) [10:33:38] (03Merged) 10jenkins-bot: Bump deb version [software/tools-webservice] - 10https://gerrit.wikimedia.org/r/296874 (owner: 10Yuvipanda) [10:33:56] (03PS2) 10Alexandros Kosiaris: base: fix various spec issues [puppet] - 10https://gerrit.wikimedia.org/r/296871 (owner: 10Hashar) [10:33:58] (03CR) 10Alexandros Kosiaris: [V: 032] base: fix various spec issues [puppet] - 10https://gerrit.wikimedia.org/r/296871 (owner: 10Hashar) [10:34:43] bundle exec rake release_checks [10:38:31] 07Blocked-on-Operations, 10Continuous-Integration-Infrastructure, 10Packaging, 05Gerrit-Migration, and 2 others: Package xhpast (libphutil) - https://phabricator.wikimedia.org/T137770#2420994 (10fgiunchedi) as it turns out, the debs work fine in trusty too, I've copied them to `trusty-wikimedia`: ``` root... [10:38:34] hashar: ^ [10:38:52] godog: so just copy pasted ? ;-) [10:39:08] hashar: essentially yeah, can you try too on the trusty machines? [10:39:21] awesome [10:39:24] yeah going to try it [10:41:28] (03CR) 10Filippo Giunchedi: [C: 031] "LGTM, OK on $domain_networks" [puppet] - 10https://gerrit.wikimedia.org/r/296375 (owner: 10Alexandros Kosiaris) [10:43:19] (03CR) 10Jcrespo: [C: 04-1] "this sends logs to a file called syslog, the actual option is called "syslog": http://dev.mysql.com/doc/refman/5.6/en/mysqld-safe.html#opt" [puppet] - 10https://gerrit.wikimedia.org/r/296713 (https://phabricator.wikimedia.org/T119370) (owner: 10Hashar) [10:43:21] (03PS3) 10Addshore: Deploy RevisionSlider to test test2 and testwikidata [mediawiki-config] - 10https://gerrit.wikimedia.org/r/296753 (https://phabricator.wikimedia.org/T138943) [10:43:28] (03PS3) 10Filippo Giunchedi: syslog: limit source range to $PRODUCTION_NETWORKS [puppet] - 10https://gerrit.wikimedia.org/r/295368 [10:43:35] (03CR) 10Filippo Giunchedi: [C: 032 V: 032] syslog: limit source range to $PRODUCTION_NETWORKS [puppet] - 10https://gerrit.wikimedia.org/r/295368 (owner: 10Filippo Giunchedi) [10:45:55] (03PS2) 10Alexandros Kosiaris: Introduce $::networks::constants::networks [puppet] - 10https://gerrit.wikimedia.org/r/296375 [10:47:47] (03PS4) 10Addshore: Deploy RevisionSlider to test and test2 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/296753 (https://phabricator.wikimedia.org/T138943) [10:48:26] (03PS3) 10Alexandros Kosiaris: Introduce $::networks::constants::domain_networks [puppet] - 10https://gerrit.wikimedia.org/r/296375 [10:50:01] 07Blocked-on-Operations, 10Continuous-Integration-Infrastructure, 10Packaging, 05Gerrit-Migration, and 2 others: Package xhpast (libphutil) - https://phabricator.wikimedia.org/T137770#2421005 (10hashar) ``` $ apt-cache policy arcanist arcanist: Installed: (none) Candidate: 0~git20160620-0wmf1 Version... [10:50:36] godog: arcanist looks fine. I found out the binary xhpast raises an error on both Trusty and Jessie: libstdc++.so.6: version `GLIBCXX_3.4.21' not found [10:54:16] hashar: mh ok thanks I'm looking [10:54:40] godog: Jessie has the same issue apparently [10:54:44] hashar hi, is this https://github.com/wikimedia/integration-raita even used anymore [10:54:59] Im asking since im updating git.wikimedia.org references [10:55:05] and found one in package.json. [10:55:10] paladox: that is the browser tests dashboard on http://raita.wmflabs.org/ [10:55:14] Oh [10:55:17] Thanks [10:55:22] paladox: though it hasn't been updated/tweaked in a while afaik [10:55:29] oh [10:55:39] paladox: the canonical repo is integration/raita.git on Gerrit [10:55:50] ok thanks [10:55:56] yes looks like it needs updating. [10:56:00] commit link dosent work [10:56:19] https://git.wikimedia.org/commit/%2Fmediawiki%2Fextensions%2FFlow/54001a06d49067390e1e7b78a456bc7088da8a16 [10:56:24] Links like ^^ [10:57:19] paladox: well eventually folks will adjust their URL to point to Diffusion instead of the legacy git.wikimedia.org ? [10:57:32] Yep [10:57:44] we are going through all lines on repos now. [10:58:03] 400+ but ostriches said we didnt have too do those ones since there's no point [10:58:13] so im going through ones that affect the repo [11:00:30] (03PS1) 10Jcrespo: Disable crons using the phabricator db slave due to maintenance [puppet] - 10https://gerrit.wikimedia.org/r/296877 (https://phabricator.wikimedia.org/T138460) [11:02:27] (03CR) 10Jcrespo: "The impact is larger than initially thought, please review the extra processes disabled so I can perform the maintenance (probably, on Mon" [puppet] - 10https://gerrit.wikimedia.org/r/296877 (https://phabricator.wikimedia.org/T138460) (owner: 10Jcrespo) [11:03:17] hashar im wondering if you would be able to review the patch when i upload it please. [11:03:26] Since i would be unsure if i updated correctly [11:03:28] please [11:05:45] hashar https://gerrit.wikimedia.org/r/296879 please [11:06:11] (03PS1) 10Hashar: nrpe: simplify tests with puppetlabs_spec_helper [puppet] - 10https://gerrit.wikimedia.org/r/296881 [11:10:06] (03PS1) 10Dereckson: Add contentdm.lib.byu.edu to wgCopyUploadsDomains [mediawiki-config] - 10https://gerrit.wikimedia.org/r/296882 (https://phabricator.wikimedia.org/T139095) [11:12:53] (03PS1) 10Hashar: osm: simplify tests with puppetlabs_spec_helper [puppet] - 10https://gerrit.wikimedia.org/r/296884 [11:13:04] (03CR) 10Muehlenhoff: [C: 04-1] "One additional rename needed" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/296375 (owner: 10Alexandros Kosiaris) [11:14:32] 06Operations, 10Traffic, 06Community-Liaisons (Jul-Sep-2016): Help contact bot owners about the end of HTTP access to the API - https://phabricator.wikimedia.org/T136674#2343854 (10Elitre) For the future, it may be useful to use some kind of confirmation system (maybe like the one Commons uses for sysops who... [11:14:41] (03PS2) 10Dereckson: Add contentdm.lib.byu.edu to wgCopyUploadsDomains [mediawiki-config] - 10https://gerrit.wikimedia.org/r/296882 (https://phabricator.wikimedia.org/T139095) [11:15:46] !log removed two obsolete, older kernel packages from wtp1002 (had flagged an icinga warning on diskspace on /boot) [11:15:50] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [11:23:21] 06Operations: Several mw* servers missing in conftool-data, but present in site.pp - https://phabricator.wikimedia.org/T139154#2421057 (10MoritzMuehlenhoff) [11:23:38] !log bounce statsdlb on graphite1001, drops are back after yesterday's reboot T101141 [11:23:39] T101141: udp rcvbuferrors and inerrors on graphite1001 - https://phabricator.wikimedia.org/T101141 [11:23:42] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [11:24:00] hashar would you be able to review https://gerrit.wikimedia.org/r/296887 please. [11:24:57] (03PS1) 10Hashar: servermon: simplify tests with puppetlabs_spec_helper [puppet] - 10https://gerrit.wikimedia.org/r/296888 [11:25:14] akosiaris: gehel I am just going to sprint migrate all the puppet spec to use the puppet labs helper :-) [11:25:44] hashar: just did the OSM module... committing right now [11:25:54] hashar: yeah for the cleanup! [11:25:55] conflict!!! ;-) [11:26:09] nice way to have conflict! [11:26:34] and I guess next week we can probably get a Jenkins job that runs all the spec [11:26:55] gehel: if you want to compare your osm change with mine https://gerrit.wikimedia.org/r/#/c/296884/ [11:27:00] they should be identical ideally [11:27:34] also I found out that puppet labs module have a metadata.json file that apparently list dependencies [11:27:55] and maybe the puppet labs spec helper would be able to recognize that instead of having to fill a .fixtures.yml [11:28:18] hashar: our changes seems identical, I'll just drop mine [11:28:38] gehel: or use your :) [11:29:05] metadata.json is mostly broken. As far as I knkow, there is no good way to resolve dependencies listed in metadata.json to something else than puppet forge. [11:29:34] hashar: things might have changed since last time I looked into this. If you find something that works, let me know! [11:30:15] mine is not published yet, you win on the speed! [11:30:39] I am a pro copy paster [11:31:04] expected ipresolve("2620::861:1:208:80:154:10", "ptr") to have returned "carbon.wikimedia.org" instead of raising RuntimeError(DNS lookup failed for 0.1.0.0.4.5.1.0.0.8.0.0.8.0.2.0.1.0.0.0.1.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa Resolv::DNS::Resource::IN::PTR) [11:31:06] .. [11:31:23] (03CR) 10Gehel: [C: 032] osm: simplify tests with puppetlabs_spec_helper [puppet] - 10https://gerrit.wikimedia.org/r/296884 (owner: 10Hashar) [11:37:04] 07Blocked-on-Operations, 10Continuous-Integration-Infrastructure, 10Packaging, 05Gerrit-Migration, and 2 others: Package xhpast (libphutil) - https://phabricator.wikimedia.org/T137770#2421108 (10fgiunchedi) thanks @hashar ! I found some issues with xhpast, addressed in {D289} [11:38:37] godog: ah great. Now I understand that ${shlibs:Depends} really has a purpose :) [11:38:59] godog: fun thing is that the .deb package on my Jessie machine at home has xhpast working just fine iirc [11:40:23] hashar: yeah it is more about xhpast not being cleaned than shlibs:Depends [11:43:20] ohhh [11:43:34] godog: the __tests__ have to stick around :( [11:43:46] godog: inline comment has all the details / references [11:43:48] that is pretty lame [11:43:52] well [11:44:01] not lame but misleading [11:47:09] (03PS1) 10ArielGlenn: add xml dump config options for order by rev_id for stub dumps [puppet] - 10https://gerrit.wikimedia.org/r/296892 [11:48:23] hashar: ah ok, thanks for the quick review! [11:48:33] (03CR) 10jenkins-bot: [V: 04-1] add xml dump config options for order by rev_id for stub dumps [puppet] - 10https://gerrit.wikimedia.org/r/296892 (owner: 10ArielGlenn) [11:50:19] (03PS2) 10Muehlenhoff: Install fonts-noto-cjk on jessie image scalers [puppet] - 10https://gerrit.wikimedia.org/r/296237 (https://phabricator.wikimedia.org/T123223) [11:51:50] godog: would delegate the rest to mukunda to verify/ tests etc later tonight [11:52:18] and maybe rebuild it myself / verify when I am at home [11:52:34] (03PS4) 10Alexandros Kosiaris: Introduce $::networks::constants::domain_networks [puppet] - 10https://gerrit.wikimedia.org/r/296375 [11:54:09] (03PS2) 10ArielGlenn: add xml dump config options for order by rev_id for stub dumps [puppet] - 10https://gerrit.wikimedia.org/r/296892 [11:54:57] hashar: ok thanks [11:55:03] (03Abandoned) 10Paladox: Allow git.wikimedia.org/git/passport-mediawiki/git to be redirected properly [puppet] - 10https://gerrit.wikimedia.org/r/296816 (https://phabricator.wikimedia.org/T137353) (owner: 10Paladox) [11:56:59] (03CR) 10Muehlenhoff: [C: 031] "Looks good to me" [puppet] - 10https://gerrit.wikimedia.org/r/296375 (owner: 10Alexandros Kosiaris) [12:01:43] (03CR) 10ArielGlenn: [C: 032] add xml dump config options for order by rev_id for stub dumps [puppet] - 10https://gerrit.wikimedia.org/r/296892 (owner: 10ArielGlenn) [12:03:27] (03PS3) 10Muehlenhoff: Install fonts-noto-cjk on jessie image scalers [puppet] - 10https://gerrit.wikimedia.org/r/296237 (https://phabricator.wikimedia.org/T123223) [12:03:44] (03CR) 10Muehlenhoff: [C: 032 V: 032] Install fonts-noto-cjk on jessie image scalers [puppet] - 10https://gerrit.wikimedia.org/r/296237 (https://phabricator.wikimedia.org/T123223) (owner: 10Muehlenhoff) [12:05:20] (03PS1) 10ArielGlenn: fix the var names in the xml dumps conf file for stubs settings [puppet] - 10https://gerrit.wikimedia.org/r/296897 [12:05:47] (03PS2) 10ArielGlenn: fix the var names in the xml dumps conf file for stubs settings [puppet] - 10https://gerrit.wikimedia.org/r/296897 [12:07:38] (03CR) 10ArielGlenn: [C: 032] fix the var names in the xml dumps conf file for stubs settings [puppet] - 10https://gerrit.wikimedia.org/r/296897 (owner: 10ArielGlenn) [12:08:27] 06Operations, 10Wikimedia-SVG-rendering, 13Patch-For-Review: Install Noto CJK (Source Han Sans) font family for SVG rendering - https://phabricator.wikimedia.org/T123223#2421198 (10MoritzMuehlenhoff) @PhiLiP: fonts-noto-cjk is now installed on the eight image scaling servers running Debian jessie. It's not a... [12:08:43] gehel: I am willing to abandon Nicko change ( https://gerrit.wikimedia.org/r/#/c/282484/ ) it is mostly superseded by migrating each puppet module to use puppetlabs_spec_helper directly [12:09:40] gehel: i guessI am going to write some kind words, or maybe I can reuse that change to write the entry point that is going to run all the spec from the root of the repo [12:11:09] hashar: feel free to drop that change, Nicko won't mind [12:11:25] hashar: I'll send the kind words... [12:11:44] 06Operations, 07Puppet, 06Commons, 10Wikimedia-SVG-rendering, and 2 others: Add Gujarati fonts to Wikimedia servers - https://phabricator.wikimedia.org/T129500#2421199 (10MoritzMuehlenhoff) @hashar : Are the fonts on image scalers relevant for CI? We now have eight image scalers running on Debian jessie (w... [12:18:37] (03PS1) 10Elukey: Raise the Hadoop HDFS datanode heapsize to 2GB. [puppet] - 10https://gerrit.wikimedia.org/r/296899 [12:20:34] 06Operations, 10MediaWiki-extensions-UniversalLanguageSelector, 10Wikimedia-SVG-rendering, 07I18n: MB Lateefi Fonts for Sindhi Wikipedia. - https://phabricator.wikimedia.org/T138136#2421217 (10MoritzMuehlenhoff) @mehtab.ahmed : I'm afraid we're missing some language-specific background here, please help us... [12:28:46] 06Operations, 10Wikimedia-SVG-rendering: Install (currently non-existing) Debian packages for PT (paratype) font on image scalars - https://phabricator.wikimedia.org/T97181#1235111 (10MoritzMuehlenhoff) There's a mail from 2010 about packaging that font: https://lists.alioth.debian.org/pipermail/pkg-fonts-deve... [12:46:11] (03CR) 10Alexandros Kosiaris: [C: 032] Introduce $::networks::constants::domain_networks [puppet] - 10https://gerrit.wikimedia.org/r/296375 (owner: 10Alexandros Kosiaris) [12:46:18] (03PS5) 10Alexandros Kosiaris: Introduce $::networks::constants::domain_networks [puppet] - 10https://gerrit.wikimedia.org/r/296375 [12:46:27] (03CR) 10Alexandros Kosiaris: [C: 032 V: 032] Introduce $::networks::constants::domain_networks [puppet] - 10https://gerrit.wikimedia.org/r/296375 (owner: 10Alexandros Kosiaris) [12:47:28] (03PS2) 10Alexandros Kosiaris: Introduce $::networks::constants::labs_networks [puppet] - 10https://gerrit.wikimedia.org/r/296376 [12:47:35] (03CR) 10Alexandros Kosiaris: [C: 032 V: 032] Introduce $::networks::constants::labs_networks [puppet] - 10https://gerrit.wikimedia.org/r/296376 (owner: 10Alexandros Kosiaris) [12:48:52] (03PS5) 10Hashar: (WIP) Modification of Rakefile spec entry point (WIP) [puppet] - 10https://gerrit.wikimedia.org/r/282484 (https://phabricator.wikimedia.org/T78342) (owner: 10Nicko) [12:50:14] (03CR) 10jenkins-bot: [V: 04-1] (WIP) Modification of Rakefile spec entry point (WIP) [puppet] - 10https://gerrit.wikimedia.org/r/282484 (https://phabricator.wikimedia.org/T78342) (owner: 10Nicko) [12:51:41] puppet is failing [12:51:45] yes [12:51:47] fixing [12:51:56] ah, so known, np [12:52:08] I stopped icinga-wm on purpose to avoid the spam [12:52:39] (03PS1) 10Alexandros Kosiaris: ferm: fix typo in defs.erb introduced in fec9c9f [puppet] - 10https://gerrit.wikimedia.org/r/296905 [12:53:05] (03CR) 10Alexandros Kosiaris: [C: 032 V: 032] ferm: fix typo in defs.erb introduced in fec9c9f [puppet] - 10https://gerrit.wikimedia.org/r/296905 (owner: 10Alexandros Kosiaris) [12:54:08] fixed. 133 hosts however were faster than me and contacted the puppetmaster already [12:54:26] I 'll leave icinga-wm offline for some 20-30 mins [12:55:07] (03CR) 10Hashar: "Thanks to Nicko original intent, I have eventually converted most of our puppet modules to use puppetlabs_spec_helper . That reduces their" [puppet] - 10https://gerrit.wikimedia.org/r/282484 (https://phabricator.wikimedia.org/T78342) (owner: 10Nicko) [12:59:50] !log rebooting francium for kernel update [12:59:54] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [13:00:46] akosiaris: i just got 2 pings from failing puppet on WDQS. I expect it is related to those germ rules? [13:01:46] * gehel does not have laptop right now... [13:02:16] 06Operations, 10Wikimedia-SVG-rendering: Install (currently non-existing) Debian packages for PT (paratype) font on image scalars - https://phabricator.wikimedia.org/T97181#1235111 (10hashar) From the mail a package intent is at http://anonscm.debian.org/cgit/pkg-fonts/ttf-paratype-sans.git There are a cuople... [13:02:39] gehel: ferm, not germ, but yes [13:02:59] if we had germ rules I 'd be afraid [13:03:11] we 've had to sanitize a lot as well [13:03:27] (03PS6) 10Hashar: (WIP) Modification of Rakefile spec entry point (WIP) [puppet] - 10https://gerrit.wikimedia.org/r/282484 (https://phabricator.wikimedia.org/T78342) (owner: 10Nicko) [13:03:47] Damn auto correct, or just too big fingers... [13:03:56] akosiaris: thanks! [13:04:31] !log rebooting mira for kernel update [13:04:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [13:10:50] * jynus ♥ germ rules [13:28:38] !log mw1145 swapped eth0 cable [13:28:43] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [13:29:02] 06Operations, 10ops-eqiad: mw1145: eth0 has different negotiated speed than requested - https://phabricator.wikimedia.org/T138937#2421278 (10Cmjohnson) 05Open>03Resolved swapped the cable eth0 [13:37:54] (03PS1) 10ArielGlenn: add defaults for order by rev xml stubs dump config settings [puppet] - 10https://gerrit.wikimedia.org/r/296907 [13:39:30] (03CR) 10ArielGlenn: [C: 032] add defaults for order by rev xml stubs dump config settings [puppet] - 10https://gerrit.wikimedia.org/r/296907 (owner: 10ArielGlenn) [13:41:23] (03PS1) 10Cmjohnson: mw1120 mgmt fwd address was removed from wmet file [dns] - 10https://gerrit.wikimedia.org/r/296908 [13:41:43] (03PS6) 10ArielGlenn: add ability to do xml stubs dump pieces based on revs per page range [dumps] - 10https://gerrit.wikimedia.org/r/296742 (https://phabricator.wikimedia.org/T137887) [13:42:24] (03CR) 10Cmjohnson: [C: 032] mw1120 mgmt fwd address was removed from wmet file [dns] - 10https://gerrit.wikimedia.org/r/296908 (owner: 10Cmjohnson) [13:45:56] Hi cmjohnson1! Hope that you are feeling better :) You are probably full of things to do today, but if you have time I have a couple of hosts in trouble: mw130[156].eqiad.wmnet and analytics1049 [13:46:18] sorry, mw130[256] [13:46:28] mw1301 is fine :) [13:46:39] elukey i am working through the all of the mw's now [13:47:03] super thanks! [13:47:34] !log rebooting osmium for kernel update [13:47:38] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [13:50:12] akosiaris: thanks for merging:) no more ./misc yay [13:51:03] (we don't do import 'misc/*.pp in site.pp anymore) [13:51:18] since today [13:53:22] 06Operations, 10ops-eqiad, 13Patch-For-Review: mw1103, mw1120 , mw1259 stuck after reboot - https://phabricator.wikimedia.org/T138921#2421302 (10Cmjohnson) mw1120 forward entry was missing, added back https://gerrit.wikimedia.org/r/#/c/296908/1 mw1103, was hung up, rebooted and now is accessible mw1159 did... [13:54:47] 06Operations, 10ops-eqiad, 13Patch-For-Review: mw1103, mw1120 , mw1259 stuck after reboot - https://phabricator.wikimedia.org/T138921#2421303 (10Cmjohnson) a:05Cmjohnson>03MoritzMuehlenhoff Assigning to @MoritzMuehlenhoff or @elukey to add back to dsh groups. [13:55:58] !log rebooting codfw poolcounters for kernel update [13:56:02] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [13:56:42] (03CR) 10ArielGlenn: [C: 032] "Fully tested, so in it goes. Note that at the moment this is enabled (via puppet) only for wikidatawiki, due to inconsistent indices acro" [dumps] - 10https://gerrit.wikimedia.org/r/296742 (https://phabricator.wikimedia.org/T137887) (owner: 10ArielGlenn) [13:57:27] (03PS3) 10Dzahn: phab: only mirror refs/heads/ and ./tags/ for mwcore and ops/puppet [puppet] - 10https://gerrit.wikimedia.org/r/295011 (owner: 10Paladox) [13:57:42] (03CR) 10Dzahn: "This is for the releng team" [puppet] - 10https://gerrit.wikimedia.org/r/295011 (owner: 10Paladox) [13:58:55] (03PS1) 10ArielGlenn: re-enable full dumps cron job for first of the month [puppet] - 10https://gerrit.wikimedia.org/r/296916 [14:03:47] (03CR) 10ArielGlenn: [C: 032] re-enable full dumps cron job for first of the month [puppet] - 10https://gerrit.wikimedia.org/r/296916 (owner: 10ArielGlenn) [14:05:08] (03CR) 10Elukey: "Puppet compiler: https://puppet-compiler.wmflabs.org/3256/" [puppet] - 10https://gerrit.wikimedia.org/r/296899 (owner: 10Elukey) [14:07:37] (03PS2) 10Elukey: Raise the Hadoop HDFS datanode heapsize to 2GB. [puppet] - 10https://gerrit.wikimedia.org/r/296899 (https://phabricator.wikimedia.org/T139071) [14:09:27] (03PS1) 10Muehlenhoff: Readd three servers to dsh [puppet] - 10https://gerrit.wikimedia.org/r/296917 [14:10:21] 06Operations, 07Graphite, 05MW-1.27-release-notes, 13Patch-For-Review: udp rcvbuferrors and inerrors on graphite1001 - https://phabricator.wikimedia.org/T101141#2421342 (10fgiunchedi) this issue seems to be back after yesterday's reboot of graphite1001 for a trusty kernel upgrade: ``` graphite1001:~$ grep... [14:10:57] (03CR) 10Muehlenhoff: [C: 032 V: 032] Readd three servers to dsh [puppet] - 10https://gerrit.wikimedia.org/r/296917 (owner: 10Muehlenhoff) [14:15:18] !log rearmed keyholder on mira after reboot [14:15:23] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [14:16:42] 06Operations, 10ops-eqiad, 13Patch-For-Review: mw1103, mw1120 , mw1259 stuck after reboot - https://phabricator.wikimedia.org/T138921#2421372 (10MoritzMuehlenhoff) 05Open>03Resolved Ok, repooled where previously depooled and readded to dsh via https://gerrit.wikimedia.org/r/296917 [14:17:03] 06Operations, 06Analytics-Kanban, 10Traffic, 13Patch-For-Review: Verify why varnishkafka stats and webrequest logs count differs - https://phabricator.wikimedia.org/T136314#2421374 (10elukey) Ran an experiment on cp300[89] and the following query removes all the occurrences of VSL timeout: ``` varnishlog... [14:21:52] !log enable another statsdlb instance temporarily on graphite1001 to investigate drops [14:23:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [14:23:45] morebots: srsly two minutes? :( [14:23:45] I am a logbot running on tools-exec-1211. [14:23:45] Messages are logged to wikitech.wikimedia.org/wiki/Server_Admin_Log. [14:23:45] To log a message, type !log . [14:29:32] 06Operations, 10ops-eqiad, 13Patch-For-Review: mw1223 completely hangs - https://phabricator.wikimedia.org/T138930#2421413 (10Cmjohnson) a:05Cmjohnson>03MoritzMuehlenhoff Hard powercycle performed. All is well @MoritzMuehlenhoff please add back to dsh group [14:30:40] 06Operations, 10ops-eqiad, 13Patch-For-Review: Broken memory on mw1217 - https://phabricator.wikimedia.org/T138925#2421416 (10Cmjohnson) Server is out of warranty. I will rummage through the decommission servers to see if I can find a spare. [14:32:17] elukey: mw130[2,5,6] did not have any issues. I installed OS on 1302 and 13-5-6 are finishing up now. I will leave it up to you to add puppet certs if that's okay? [14:33:15] RECOVERY - DPKG on mw1091 is OK: All packages OK [14:33:16] PROBLEM - puppet last run on mw1091 is CRITICAL: Connection refused by host [14:33:26] PROBLEM - Disk space on mw1091 is CRITICAL: Connection refused by host [14:33:30] cmjohnson1: ah ok sorry about that! I saw the ticket opened for 5/6 and wasn't able to use mgmt for 1302. Probably temporarily, I'll finish their install sure! [14:34:03] 1302 didn't have any issues either....i used Dell's racadm controls [14:34:16] RECOVERY - dhclient process on mw1090 is OK: PROCS OK: 0 processes with command name dhclient [14:34:17] RECOVERY - Apache HTTP on mw1090 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 628 bytes in 8.367 second response time [14:34:17] RECOVERY - HHVM rendering on mw1091 is OK: HTTP OK: HTTP/1.1 200 OK - 72141 bytes in 9.285 second response time [14:34:23] my bad then, not sure why :( [14:34:25] PROBLEM - HHVM processes on mw1090 is CRITICAL: Connection refused by host [14:34:35] PROBLEM - salt-minion processes on mw1090 is CRITICAL: Connection refused by host [14:34:55] RECOVERY - nutcracker port on mw1091 is OK: TCP OK - 0.000 second response time on port 11212 [14:35:36] RECOVERY - puppet last run on mw1091 is OK: OK: Puppet is currently enabled, last run 21 minutes ago with 0 failures [14:36:36] RECOVERY - puppet last run on mw1152 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [14:36:57] RECOVERY - dhclient process on mw1091 is OK: PROCS OK: 0 processes with command name dhclient [14:37:15] RECOVERY - Apache HTTP on mw1091 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 628 bytes in 1.050 second response time [14:38:06] RECOVERY - Disk space on mw1091 is OK: DISK OK [14:38:26] PROBLEM - HHVM processes on mw1091 is CRITICAL: Connection refused by host [14:38:56] PROBLEM - Apache HTTP on mw1090 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:38:56] RECOVERY - DPKG on mw1090 is OK: All packages OK [14:39:05] RECOVERY - HHVM processes on mw1090 is OK: PROCS OK: 6 processes with command name hhvm [14:39:36] PROBLEM - mediawiki-installation DSH group on mw1103 is CRITICAL: Host mw1103 is not in mediawiki-installation dsh group [14:41:24] cmjohnson1: --^ are these mw130[56]? [14:41:35] https://phabricator.wikimedia.org/T134309#2338122 [14:41:40] the DNS looks wrong.. [14:41:46] PROBLEM - Apache HTTP on mw1091 is CRITICAL: Connection refused [14:41:49] elukey: i think thay may be [14:41:53] (03PS1) 10Muehlenhoff: Readd mw1223 to dsh group [puppet] - 10https://gerrit.wikimedia.org/r/296919 [14:41:55] RECOVERY - HHVM rendering on mw1090 is OK: HTTP OK: HTTP/1.1 200 OK - 72141 bytes in 8.745 second response time [14:42:09] also https://phabricator.wikimedia.org/T134309#2390881 [14:42:16] PROBLEM - nutcracker port on mw1090 is CRITICAL: Connection refused by host [14:42:22] (03CR) 10JanZerebecki: "Need to go, so only reviewed the first files." (035 comments) [debs/geckodriver] - 10https://gerrit.wikimedia.org/r/294293 (https://phabricator.wikimedia.org/T137797) (owner: 10Hashar) [14:42:58] elukey: I am not sure why they're showing up like that...they're installed w/correct name [14:42:59] (03CR) 10Muehlenhoff: [C: 032 V: 032] Readd mw1223 to dsh group [puppet] - 10https://gerrit.wikimedia.org/r/296919 (owner: 10Muehlenhoff) [14:43:26] PROBLEM - HHVM rendering on mw1091 is CRITICAL: Connection refused [14:43:55] RECOVERY - Unmerged changes on repository puppet on strontium is OK: No changes to merge. [14:43:56] PROBLEM - dhclient process on mw1091 is CRITICAL: Connection refused by host [14:44:04] 06Operations, 10ops-eqiad, 13Patch-For-Review: mw1223 completely hangs - https://phabricator.wikimedia.org/T138930#2421447 (10MoritzMuehlenhoff) 05Open>03Resolved p:05Triage>03Normal Repooled and readded to dsh via https://gerrit.wikimedia.org/r/#/c/296919/ [14:44:25] PROBLEM - Check size of conntrack table on mw1090 is CRITICAL: Connection refused by host [14:44:51] hrm.....this is strange...1090 is in the rack and connected...is @mortzm doing anyting? [14:45:46] PROBLEM - configured eth on mw1091 is CRITICAL: Connection refused by host [14:45:47] RECOVERY - Apache HTTP on mw1090 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 628 bytes in 1.055 second response time [14:45:47] PROBLEM - salt-minion processes on mw1091 is CRITICAL: Connection refused by host [14:45:57] PROBLEM - ElasticSearch health check for shards on nobelium is CRITICAL: CRITICAL - elasticsearch http://10.64.37.14:9200/_cluster/health error while fetching: HTTPConnectionPool(host=10.64.37.14, port=9200): Max retries exceeded with url: /_cluster/health (Caused by class socket.error: [Errno 111] Connection refused) [14:46:07] RECOVERY - salt-minion processes on mw1090 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion [14:46:11] (03PS1) 10Filippo Giunchedi: use SO_REUSEPORT in code, not SO_REUSEADDR [software/statsdlb] - 10https://gerrit.wikimedia.org/r/296920 [14:46:15] elukey: they're both using the same ip [14:46:19] cmjohnson1: nope, not me [14:46:31] (03PS1) 10Dzahn: icinga/mariadb: move plugin into module [puppet/mariadb] - 10https://gerrit.wikimedia.org/r/296921 [14:46:48] that seems to have been a-preexisting problem, see https://phabricator.wikimedia.org/T134309#2338122 and https://phabricator.wikimedia.org/T134309#2390881 [14:48:06] RECOVERY - salt-minion processes on mw1091 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion [14:48:07] 06Operations, 10Wikimedia-Apache-configuration, 07HHVM, 07Wikimedia-log-errors: Fix Apache proxy_fcgi error "Invalid argument: AH01075: Error dispatching request to" (Causing HTTP 503) - https://phabricator.wikimedia.org/T73487#2421490 (10elukey) >>! In T73487#2393128, @Joe wrote: > The "got bogus version... [14:48:47] PROBLEM - HHVM rendering on mw1090 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:49:01] (03PS1) 10Cmjohnson: Fixing duplicated use of ip's for mw1305/6. [dns] - 10https://gerrit.wikimedia.org/r/296922 [14:49:16] RECOVERY - nutcracker port on mw1090 is OK: TCP OK - 0.000 second response time on port 11212 [14:49:19] (03PS1) 10Dzahn: icinga: move check_mariadb plugin into module [puppet] - 10https://gerrit.wikimedia.org/r/296923 [14:49:36] PROBLEM - puppet last run on mw1091 is CRITICAL: Timeout while attempting connection [14:49:46] PROBLEM - Disk space on mw1091 is CRITICAL: Timeout while attempting connection [14:49:56] RECOVERY - HHVM processes on mw1091 is OK: PROCS OK: 6 processes with command name hhvm [14:50:12] (03CR) 10Cmjohnson: [C: 032] Fixing duplicated use of ip's for mw1305/6. [dns] - 10https://gerrit.wikimedia.org/r/296922 (owner: 10Cmjohnson) [14:50:16] RECOVERY - configured eth on mw1091 is OK: OK - interfaces up [14:50:17] (03PS2) 10Dzahn: icinga/mariadb: move plugin into module [puppet/mariadb] - 10https://gerrit.wikimedia.org/r/296921 [14:50:26] RECOVERY - HHVM rendering on mw1091 is OK: HTTP OK: HTTP/1.1 200 OK - 72138 bytes in 2.141 second response time [14:50:27] PROBLEM - Apache HTTP on mw1090 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [14:50:30] elukey: i removed the cables for 1305 and 6 [14:50:37] updating dns now [14:50:46] RECOVERY - dhclient process on mw1091 is OK: PROCS OK: 0 processes with command name dhclient [14:51:05] RECOVERY - Apache HTTP on mw1091 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 628 bytes in 1.044 second response time [14:51:06] RECOVERY - HHVM rendering on mw1090 is OK: HTTP OK: HTTP/1.1 200 OK - 72138 bytes in 9.905 second response time [14:51:10] cmjohnson1 super.. should we depool 1090/91? [14:51:16] RECOVERY - Check size of conntrack table on mw1090 is OK: OK: nf_conntrack is 0 % full [14:51:35] I don't think so...looks to be recovering [14:51:43] but you would know better [14:51:51] moritzm: ? [14:51:56] RECOVERY - Disk space on mw1091 is OK: DISK OK [14:52:45] RECOVERY - Apache HTTP on mw1090 is OK: HTTP OK: HTTP/1.1 301 Moved Permanently - 628 bytes in 4.052 second response time [14:53:14] elukey, cmjohnson1: they seem fine, let's leave them pooled [14:53:24] (03PS2) 10Filippo Giunchedi: use SO_REUSEPORT in code, not SO_REUSEADDR [software/statsdlb] - 10https://gerrit.wikimedia.org/r/296920 (https://phabricator.wikimedia.org/T126447) [14:53:31] yeah I was just checking them via ssh [14:53:33] looks fine [14:53:38] (1090/1091 I mean) [14:54:04] 06Operations, 10Traffic, 06Community-Liaisons (Jul-Sep-2016): Help contact bot owners about the end of HTTP access to the API - https://phabricator.wikimedia.org/T136674#2421525 (10Dzahn) For the suggestion by Elitre, we could possibly use the "L" system here in Phabricator. Because that is what it does, let... [14:56:30] 06Operations, 10ops-eqiad: Rack/Setup Carbon/Apt Server Replacement - https://phabricator.wikimedia.org/T139171#2421526 (10Cmjohnson) [14:56:58] 06Operations, 10ops-eqiad: Rack/Setup Carbon/Apt Server Replacement - https://phabricator.wikimedia.org/T139171#2421539 (10Cmjohnson) [14:57:05] RECOVERY - Disk space on elastic1022 is OK: DISK OK [14:57:11] !log upgraded and restarted elastic on nobelium@eqiad [14:57:15] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [14:57:17] RECOVERY - puppet last run on mw1090 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [15:01:26] 06Operations, 10Traffic, 06Community-Liaisons (Jul-Sep-2016): Help contact bot owners about the end of HTTP access to the API - https://phabricator.wikimedia.org/T136674#2421557 (10Elitre) If the interested parties can see a list of everyone who's already signed, then that's a possibility (although I can't p... [15:02:36] RECOVERY - ElasticSearch health check for shards on nobelium is OK: OK - elasticsearch status labsearch: status: yellow, number_of_nodes: 1, unassigned_shards: 1, number_of_pending_tasks: 0, number_of_in_flight_fetch: 0, timed_out: False, active_primary_shards: 504, task_max_waiting_in_queue_millis: 0, cluster_name: labsearch, relocating_shards: 0, active_shards_percent_as_number: 99.801980198, active_shards: 504, initializing_shards: [15:08:46] RECOVERY - mediawiki-installation DSH group on mw1120 is OK: OK [15:10:47] (03PS2) 10Alexandros Kosiaris: nrpe: simplify tests with puppetlabs_spec_helper [puppet] - 10https://gerrit.wikimedia.org/r/296881 (owner: 10Hashar) [15:10:55] (03CR) 10Alexandros Kosiaris: [C: 032 V: 032] nrpe: simplify tests with puppetlabs_spec_helper [puppet] - 10https://gerrit.wikimedia.org/r/296881 (owner: 10Hashar) [15:14:20] (03PS10) 10Alexandros Kosiaris: Check all ores web nodes [puppet] - 10https://gerrit.wikimedia.org/r/296535 (https://phabricator.wikimedia.org/T134782) (owner: 10Nschaaf) [15:15:56] akosiaris: and there is another one for servermon module : https://gerrit.wikimedia.org/r/#/c/296888/ ;) [15:16:15] RECOVERY - puppet last run on mw1091 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [15:19:14] (03PS11) 10Alexandros Kosiaris: Check all ores web nodes [puppet] - 10https://gerrit.wikimedia.org/r/296535 (https://phabricator.wikimedia.org/T134782) (owner: 10Nschaaf) [15:23:52] (03CR) 10Filippo Giunchedi: "regardless of this fix, there would be still puppet work to do to have multiple instances of statsdlb." [software/statsdlb] - 10https://gerrit.wikimedia.org/r/296920 (https://phabricator.wikimedia.org/T126447) (owner: 10Filippo Giunchedi) [15:25:45] (03CR) 10Dzahn: [C: 031] "http://puppet-compiler.wmflabs.org/3259/neon.wikimedia.org/" [puppet] - 10https://gerrit.wikimedia.org/r/296535 (https://phabricator.wikimedia.org/T134782) (owner: 10Nschaaf) [15:26:23] (03PS12) 10Dzahn: icinga: check all ores web nodes [puppet] - 10https://gerrit.wikimedia.org/r/296535 (https://phabricator.wikimedia.org/T134782) (owner: 10Nschaaf) [15:30:36] (03PS1) 10Hashar: rsync: fix spec that used ruby 1.8 syntax [puppet] - 10https://gerrit.wikimedia.org/r/296927 (https://phabricator.wikimedia.org/T78342) [15:31:33] (03CR) 10Hashar: "One more puppet module passing :)" [puppet] - 10https://gerrit.wikimedia.org/r/296927 (https://phabricator.wikimedia.org/T78342) (owner: 10Hashar) [15:32:16] PROBLEM - Unmerged changes on repository puppet on strontium is CRITICAL: There is one unmerged change in puppet (dir /var/lib/git/operations/puppet, ref HEAD..origin/production). [15:33:59] 06Operations, 10Wikimedia-SVG-rendering, 13Patch-For-Review: Install Noto CJK (Source Han Sans) font family for SVG rendering - https://phabricator.wikimedia.org/T123223#2421627 (10PhiLiP) @MoritzMuehlenhoff Great! Thanks for the work you did. I'll post this news to Chinese community once you get rid of thos... [15:35:09] 06Operations, 10Traffic, 06Community-Liaisons (Jul-Sep-2016): Help contact bot owners about the end of HTTP access to the API - https://phabricator.wikimedia.org/T136674#2421629 (10Whatamidoing-WMF) That's a potentially useful system, and I'm glad to know that it exists. I'm not sure that it's necessary for... [15:37:27] (03PS1) 10Jforrester: Switch VisualEditor to a negative rather than positive dblist [mediawiki-config] - 10https://gerrit.wikimedia.org/r/296929 [15:37:56] (03PS1) 10Jforrester: Delete no-longer-user visualeditor-default.dblist [mediawiki-config] - 10https://gerrit.wikimedia.org/r/296930 [15:38:06] (03CR) 10jenkins-bot: [V: 04-1] Switch VisualEditor to a negative rather than positive dblist [mediawiki-config] - 10https://gerrit.wikimedia.org/r/296929 (owner: 10Jforrester) [15:38:37] (03CR) 10jenkins-bot: [V: 04-1] Delete no-longer-user visualeditor-default.dblist [mediawiki-config] - 10https://gerrit.wikimedia.org/r/296930 (owner: 10Jforrester) [15:39:15] 06Operations, 10MediaWiki-extensions-UniversalLanguageSelector, 10Wikimedia-SVG-rendering, 07I18n: MB Lateefi Fonts for Sindhi Wikipedia. - https://phabricator.wikimedia.org/T138136#2421634 (10mehtab.ahmed) @MoritzMuehlenhoff going with Lateef is better option, since we don't have any fonts. It's very sad... [15:40:54] RECOVERY - mediawiki-installation DSH group on mw1103 is OK: OK [15:46:47] (03CR) 10Alexandros Kosiaris: "I made a minor change, so the change should now compile and the tests should show up. Do we still want this, despite being a bit unclean ?" [puppet] - 10https://gerrit.wikimedia.org/r/296535 (https://phabricator.wikimedia.org/T134782) (owner: 10Nschaaf) [15:47:39] (03PS1) 10Merlijn van Deen: toollabs: Allow HBA login to all hosts [puppet] - 10https://gerrit.wikimedia.org/r/296932 (https://phabricator.wikimedia.org/T104613) [15:47:46] (03CR) 10Alexandros Kosiaris: [C: 032] rsync: fix spec that used ruby 1.8 syntax [puppet] - 10https://gerrit.wikimedia.org/r/296927 (https://phabricator.wikimedia.org/T78342) (owner: 10Hashar) [15:48:11] 06Operations, 10Wikimedia-Apache-configuration, 07HHVM, 07Wikimedia-log-errors: Fix Apache proxy_fcgi error "Invalid argument: AH01075: Error dispatching request to" (Causing HTTP 503) - https://phabricator.wikimedia.org/T73487#2421663 (10elukey) Also the Bogus request id is surely a bug since you can read... [15:48:36] 06Operations, 10Traffic, 06Community-Liaisons (Jul-Sep-2016): Help contact bot owners about the end of HTTP access to the API - https://phabricator.wikimedia.org/T136674#2421664 (10Elitre) It's for us: it's so that we know at a glance who's been contacted, who has acknowledged there's action required on thei... [15:48:55] akosiaris: that is all for today. Thanks a ton for all the reviews/tests/merges etc ;} [15:49:15] hashar: \o/ [15:49:17] thanks as well [15:50:26] akosiaris: the mysql module is out of order for sure :( [15:50:34] RECOVERY - Unmerged changes on repository puppet on strontium is OK: No changes to merge. [15:52:35] (03PS1) 10Hashar: mysql: fix spec [puppet] - 10https://gerrit.wikimedia.org/r/296933 (https://phabricator.wikimedia.org/T78342) [15:52:44] that is all for today. Have a good week-end! [15:53:17] !log temporarily run 3x statsdlb instances on graphite1001 to minimise drops - T101141 [15:53:18] T101141: udp rcvbuferrors and inerrors on graphite1001 - https://phabricator.wikimedia.org/T101141 [15:53:20] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [15:55:52] (03CR) 10Jcrespo: "If this helps mysql and coredb modules on puppet have been deprecated on production, and the only reason they are still there is in case t" [puppet] - 10https://gerrit.wikimedia.org/r/296933 (https://phabricator.wikimedia.org/T78342) (owner: 10Hashar) [16:07:05] PROBLEM - mobileapps endpoints health on scb1002 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [16:08:40] (03PS1) 10Chad: Gerrit: Rename role from ::production to ::server [puppet] - 10https://gerrit.wikimedia.org/r/296936 [16:09:23] mutante: Hehe ^ :) [16:09:24] RECOVERY - mobileapps endpoints health on scb1002 is OK: All endpoints are healthy [16:11:01] Biggest blocker to labs is A) The hardcoded IPs in the role and no varying on installing the cert name by $host yet, basically requires you to have a ssl cert for the host (can't fake it with snakeoil really) [16:11:20] Eh, there was supposed to be a B) between "and" and "no" [16:16:30] yuvipanda: Looks like the mobileapps flapping is caused by ORES consuming a ton of memory on the scb100[12] boxes. https://phabricator.wikimedia.org/T139177 [16:17:21] (03PS1) 10Alexandros Kosiaris: package_builder: use ${BUILDRESULT} only if it exists [puppet] - 10https://gerrit.wikimedia.org/r/296937 [16:18:08] (03CR) 10Alexandros Kosiaris: [C: 032 V: 032] package_builder: use ${BUILDRESULT} only if it exists [puppet] - 10https://gerrit.wikimedia.org/r/296937 (owner: 10Alexandros Kosiaris) [16:18:18] yuvipanda: are you doing ORES? Or should I ping halfak instead? [16:18:27] bearND you should ping halfak [16:18:40] yuvipanda: ok. thanks [16:31:14] PROBLEM - Disk space on elastic1017 is CRITICAL: DISK CRITICAL - free space: /var/lib/elasticsearch 79124 MB (15% inode=99%) [16:36:45] (03PS2) 10Dzahn: Gerrit: Rename role from ::production to ::server [puppet] - 10https://gerrit.wikimedia.org/r/296936 (owner: 10Chad) [16:36:53] (03CR) 10Dzahn: [C: 032] "http://puppet-compiler.wmflabs.org/3260/ytterbium.wikimedia.org/" [puppet] - 10https://gerrit.wikimedia.org/r/296936 (owner: 10Chad) [16:41:14] PROBLEM - puppet last run on db2040 is CRITICAL: CRITICAL: puppet fail [16:45:16] (03CR) 10Greg Grossmeier: "No need to add me to these changes. Please don't in the future." [puppet] - 10https://gerrit.wikimedia.org/r/295011 (owner: 10Paladox) [16:45:48] (03CR) 10Paladox: "Sorry." [puppet] - 10https://gerrit.wikimedia.org/r/295011 (owner: 10Paladox) [16:48:02] 06Operations, 10ops-eqiad: Rack/Setup Carbon/Apt Server Replacement - https://phabricator.wikimedia.org/T139171#2421803 (10RobH) This will require 10GbE, and also will be an apt-mirror, not our actual apt-server. (That wording was my mistake in earlier tasks.) So it won't replace carbon entirely, but some of... [16:48:21] (03CR) 10Dzahn: "submitted now. got distracted while waiting for jenkins. ran puppet on ytterbium. we have a follow-up ..it looks" [puppet] - 10https://gerrit.wikimedia.org/r/296936 (owner: 10Chad) [16:48:36] ostriches: needs a follow-up.. even though it seemed ok in compiler.. [16:48:47] or.. wait [16:49:01] Oh I didn't try compiler yet. [16:49:01] maybe it was just that one run during the rename [16:49:04] i did [16:49:12] running it again [16:49:37] ok, false alarm , it's fine on next puppet run [16:49:49] it was just some race during the rename of the class [16:50:01] done [16:50:21] (03CR) 10Dzahn: "fine on ytterbium on next puppet run" [puppet] - 10https://gerrit.wikimedia.org/r/296936 (owner: 10Chad) [16:54:46] Ah ok [16:54:50] Yeah races are hard. [16:59:32] (03PS1) 10Yuvipanda: Revert "Revert "labs: Add support for custom cnames in labs recursor"" [puppet] - 10https://gerrit.wikimedia.org/r/296939 [17:00:06] (03CR) 10jenkins-bot: [V: 04-1] Revert "Revert "labs: Add support for custom cnames in labs recursor"" [puppet] - 10https://gerrit.wikimedia.org/r/296939 (owner: 10Yuvipanda) [17:04:01] (03PS2) 10Yuvipanda: Revert "Revert "labs: Add support for custom cnames in labs recursor"" [puppet] - 10https://gerrit.wikimedia.org/r/296939 [17:05:11] (03CR) 10jenkins-bot: [V: 04-1] Revert "Revert "labs: Add support for custom cnames in labs recursor"" [puppet] - 10https://gerrit.wikimedia.org/r/296939 (owner: 10Yuvipanda) [17:07:18] (03PS13) 10Dzahn: icinga: check all ores web nodes [puppet] - 10https://gerrit.wikimedia.org/r/296535 (https://phabricator.wikimedia.org/T134782) (owner: 10Nschaaf) [17:08:02] (03CR) 10Dzahn: "yea, now that multiple people put energy into it, let's add it" [puppet] - 10https://gerrit.wikimedia.org/r/296535 (https://phabricator.wikimedia.org/T134782) (owner: 10Nschaaf) [17:08:55] RECOVERY - puppet last run on db2040 is OK: OK: Puppet is currently enabled, last run 2 minutes ago with 0 failures [17:09:04] PROBLEM - puppet last run on cp4020 is CRITICAL: CRITICAL: Puppet has 1 failures [17:11:42] (03PS1) 10Chad: Gerrit: Prepare lead hiera for migration [puppet] - 10https://gerrit.wikimedia.org/r/296940 [17:14:22] (03PS2) 10Dzahn: Gerrit: Prepare lead hiera for migration [puppet] - 10https://gerrit.wikimedia.org/r/296940 (https://phabricator.wikimedia.org/T126794) (owner: 10Chad) [17:14:38] (03CR) 10Dzahn: "just linking to a ticket (T126794)" [puppet] - 10https://gerrit.wikimedia.org/r/296940 (https://phabricator.wikimedia.org/T126794) (owner: 10Chad) [17:15:01] (03CR) 10Yuvipanda: [C: 04-1] "minor change due to recent changes, otherwise lgtm" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/291710 (owner: 10Filippo Giunchedi) [17:15:05] (03CR) 10Dzahn: [C: 032] Gerrit: Prepare lead hiera for migration [puppet] - 10https://gerrit.wikimedia.org/r/296940 (https://phabricator.wikimedia.org/T126794) (owner: 10Chad) [17:16:35] (03CR) 10Dzahn: [C: 032] "http://puppet-compiler.wmflabs.org/3261/neon.wikimedia.org/" [puppet] - 10https://gerrit.wikimedia.org/r/296535 (https://phabricator.wikimedia.org/T134782) (owner: 10Nschaaf) [17:16:50] (03PS14) 10Dzahn: icinga: check all ores web nodes [puppet] - 10https://gerrit.wikimedia.org/r/296535 (https://phabricator.wikimedia.org/T134782) (owner: 10Nschaaf) [17:17:09] (03PS3) 10Yuvipanda: Revert "Revert "labs: Add support for custom cnames in labs recursor"" [puppet] - 10https://gerrit.wikimedia.org/r/296939 [17:17:30] (03PS4) 10Yuvipanda: Revert "Revert "labs: Add support for custom cnames in labs recursor"" [puppet] - 10https://gerrit.wikimedia.org/r/296939 [17:20:35] (03CR) 10Yuvipanda: [C: 032] Revert "Revert "labs: Add support for custom cnames in labs recursor"" [puppet] - 10https://gerrit.wikimedia.org/r/296939 (owner: 10Yuvipanda) [17:21:09] (03PS15) 10Dzahn: icinga: check all ores web nodes [puppet] - 10https://gerrit.wikimedia.org/r/296535 (https://phabricator.wikimedia.org/T134782) (owner: 10Nschaaf) [17:23:04] PROBLEM - puppet last run on mw1091 is CRITICAL: CRITICAL: Puppet has 3 failures [17:28:15] (03PS5) 10Filippo Giunchedi: prometheus: add tools role [puppet] - 10https://gerrit.wikimedia.org/r/291710 [17:29:18] (03PS6) 10Yuvipanda: prometheus: add tools role [puppet] - 10https://gerrit.wikimedia.org/r/291710 (owner: 10Filippo Giunchedi) [17:29:26] (03CR) 10Yuvipanda: [C: 032 V: 032] prometheus: add tools role [puppet] - 10https://gerrit.wikimedia.org/r/291710 (owner: 10Filippo Giunchedi) [17:34:02] RECOVERY - puppet last run on cp4020 is OK: OK: Puppet is currently enabled, last run 11 seconds ago with 0 failures [17:42:17] (03CR) 10Dzahn: "puppet part works fine, checks got added on neon. but they get a 404" [puppet] - 10https://gerrit.wikimedia.org/r/296535 (https://phabricator.wikimedia.org/T134782) (owner: 10Nschaaf) [17:42:39] (03PS1) 10Yuvipanda: labsdns: Allow returning arbitrary A records, not CNAMEs [puppet] - 10https://gerrit.wikimedia.org/r/296941 [17:44:06] (03CR) 10jenkins-bot: [V: 04-1] labsdns: Allow returning arbitrary A records, not CNAMEs [puppet] - 10https://gerrit.wikimedia.org/r/296941 (owner: 10Yuvipanda) [17:44:21] (03PS2) 10Yuvipanda: labsdns: Allow returning arbitrary A records, not CNAMEs [puppet] - 10https://gerrit.wikimedia.org/r/296941 [17:45:22] (03PS3) 10Yuvipanda: labsdns: Allow returning arbitrary A records, not CNAMEs [puppet] - 10https://gerrit.wikimedia.org/r/296941 [17:45:41] (03Abandoned) 10Dzahn: rm misc::monitoring::view::hadoop [puppet] - 10https://gerrit.wikimedia.org/r/295988 (owner: 10Dzahn) [17:46:12] RECOVERY - puppet last run on mw1091 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [17:46:22] (03PS4) 10Yuvipanda: labsdns: Allow returning arbitrary A records, not CNAMEs [puppet] - 10https://gerrit.wikimedia.org/r/296941 (https://phabricator.wikimedia.org/T139190) [17:46:41] (03PS5) 10Yuvipanda: labsdns: Allow returning arbitrary A records, not CNAMEs [puppet] - 10https://gerrit.wikimedia.org/r/296941 (https://phabricator.wikimedia.org/T139190) [17:47:13] (03CR) 10Yuvipanda: [C: 032 V: 032] labsdns: Allow returning arbitrary A records, not CNAMEs [puppet] - 10https://gerrit.wikimedia.org/r/296941 (https://phabricator.wikimedia.org/T139190) (owner: 10Yuvipanda) [17:47:28] !log restart elasticsearch on elastic1017 to attempt to clear up a continuous backlog of relocating shards [17:47:32] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [17:49:33] RECOVERY - Disk space on elastic1017 is OK: DISK OK [17:54:56] (03PS1) 10Dzahn: lint:ignore the last puppet URL without modules [puppet/mariadb] - 10https://gerrit.wikimedia.org/r/296943 [17:55:39] (03PS2) 10Dzahn: lint:ignore the last puppet URL without modules [puppet/mariadb] - 10https://gerrit.wikimedia.org/r/296943 [17:58:19] (03CR) 10Dzahn: [C: 032] lint:ignore the last puppet URL without modules [puppet/mariadb] - 10https://gerrit.wikimedia.org/r/296943 (owner: 10Dzahn) [18:00:30] (03PS1) 10Dzahn: mariadb: update subdmodule for lint-ignore [puppet] - 10https://gerrit.wikimedia.org/r/296945 [18:03:18] (03PS2) 10Dzahn: mariadb: update subdmodule for lint-ignore [puppet] - 10https://gerrit.wikimedia.org/r/296945 [18:03:56] (03PS3) 10Dzahn: mariadb: update subdmodule for lint-ignore [puppet] - 10https://gerrit.wikimedia.org/r/296945 [18:04:38] (03CR) 10Dzahn: [C: 032] mariadb: update subdmodule for lint-ignore [puppet] - 10https://gerrit.wikimedia.org/r/296945 (owner: 10Dzahn) [18:11:32] (03PS1) 10Dzahn: Revert "mariadb: update subdmodule for lint-ignore" [puppet] - 10https://gerrit.wikimedia.org/r/296946 [18:11:34] you are commiting a different changeset [18:11:48] you are tring to pull 25bd279384f22e18938a576bed6e92344a1ab4f0 [18:11:54] but that was not merged [18:12:01] 6e14c9effbf5dbd75ec6ac3d5d50020850df65c9 was [18:12:58] but when i look at 25bd279384f22e18938a576bed6e92344a1ab4f0 it says merged [18:13:08] and i compared git log do what i saw in gerrit? [18:13:10] you probably edited the text with gerrit? [18:13:24] then pushed the local change? [18:13:38] to the other repo [18:13:49] should i first revert ? [18:14:23] it doesn't matter much, I think [18:14:58] just make sure to fetch and rebase your local env before merging [18:15:00] i am still confused that it says 25bd.. is merged [18:15:05] (03CR) 10Dzahn: [C: 032] Revert "mariadb: update subdmodule for lint-ignore" [puppet] - 10https://gerrit.wikimedia.org/r/296946 (owner: 10Dzahn) [18:15:13] nice, grrit-wm [18:16:01] I need to get rid of that submodule [18:16:07] there is no reason to have that [18:16:35] sorry, I mean subrepo [18:17:05] oh, ok . so there is just one reason i touch that [18:17:19] the very last of that type of warnings across everything [18:17:45] and then i can remove an exception from the global config file [18:17:45] no, the change is ok [18:17:53] I have plans to get rid of those errors [18:18:09] with https://gerrit.wikimedia.org/r/295654 [18:18:23] but I started and there is a lot to refactor [18:18:37] ah :) [18:18:56] that doesn't even have 1/10 of the total work [18:19:10] and it is already outdated (would require manual rebase) [18:19:27] but I hope the intention is clear [18:19:35] (03PS1) 10Yuvipanda: labsdns: Add tools-redis to DNS alias as well [puppet] - 10https://gerrit.wikimedia.org/r/296947 (https://phabricator.wikimedia.org/T139190) [18:19:55] (03PS2) 10Yuvipanda: labsdns: Add tools-redis to DNS alias as well [puppet] - 10https://gerrit.wikimedia.org/r/296947 (https://phabricator.wikimedia.org/T139190) [18:20:04] i think so, yes [18:20:39] there are several things to move there: grants [18:20:42] db config [18:20:44] crons [18:21:23] in that case it was about where the icinga plugin should live [18:21:29] in the module that it monitors [18:21:35] or in the icinga module with all other plugins [18:21:41] should that be on mariadb? [18:21:47] it is generic, isn't it? [18:21:54] it's check_mariadb [18:21:59] (03PS1) 10Krinkle: Move /docroot/default/index.html to /errorpages [mediawiki-config] - 10https://gerrit.wikimedia.org/r/296948 (https://phabricator.wikimedia.org/T113114) [18:22:04] and currently in files/icinga/ [18:22:05] yes, that should go to mariadb [18:22:31] so, the plan is remove the subrepo [18:22:32] https://gerrit.wikimedia.org/r/#/c/296921/ [18:22:47] i like that :) (suckmodules :) [18:22:48] then move away from the "main" namespace [18:22:57] the generic parts to mariadb [18:23:08] the "deplyment parts" to db_maintenance [18:23:19] (I accept suggestions for better names) [18:23:31] (03CR) 10Krinkle: [C: 032] Move /docroot/default/index.html to /errorpages [mediawiki-config] - 10https://gerrit.wikimedia.org/r/296948 (https://phabricator.wikimedia.org/T113114) (owner: 10Krinkle) [18:24:11] (03Merged) 10jenkins-bot: Move /docroot/default/index.html to /errorpages [mediawiki-config] - 10https://gerrit.wikimedia.org/r/296948 (https://phabricator.wikimedia.org/T113114) (owner: 10Krinkle) [18:24:31] (03CR) 10Yuvipanda: [C: 032] labsdns: Add tools-redis to DNS alias as well [puppet] - 10https://gerrit.wikimedia.org/r/296947 (https://phabricator.wikimedia.org/T139190) (owner: 10Yuvipanda) [18:24:34] sounds good [18:24:53] now i can just move this file .. or do the lint:ignore [18:25:04] or wait until it's not a submodule [18:25:31] mutante I merged your change [18:25:51] yuvipanda: i .. i saw "No changes to merge" from puppet-merge [18:26:02] right, maybe because I merged yours? [18:26:11] Yuvipanda: labsdns: Add tools-redis to DNS alias as well (6863a8c) [18:26:11] Dzahn: Revert "mariadb: update subdmodule for lint-ignore" (b964545) [18:26:12] Dzahn: mariadb: update subdmodule for lint-ignore (db504d5) [18:26:12] I would join together the update submodule and the puppet update [18:26:22] so it is a single action [18:26:52] other than that- this is my evening, and by rule I do not alter highly-paging alerts at this time of the day [18:27:05] mutante maybe because those two were noops because they're reverting each other [18:27:12] and it only showed up when I added another commit [18:27:15] yuvipanda: yes, that's what i think [18:27:20] (this could generate 500 SMS), so I would be careful [18:27:45] ok, i am not moving the check_mariadb file in any way [18:27:51] just the special comment [18:27:59] that was kind of why i did that first [18:28:11] I can own that other change next week [18:28:19] ok, thanks [18:28:32] with a "just in case" downtime [18:28:59] can also copy instead of move [18:29:07] and later delete in old location [18:31:05] (03PS1) 10Dzahn: update submodule for lint-ignore, the right way [puppet] - 10https://gerrit.wikimedia.org/r/296949 [18:31:26] !log krinkle@tin Synchronized errorpages/: (no message) (duration: 01m 06s) [18:31:30] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [18:31:36] (03PS2) 10Dzahn: mariadb: update submodule for lint-ignore, the right way [puppet] - 10https://gerrit.wikimedia.org/r/296949 [18:31:55] +Subproject commit 6e14c9effbf5dbd75ec6ac3d5d50020850df65c9 [18:32:25] !log krinkle@tin Synchronized docroot/default/: (no message) (duration: 00m 31s) [18:32:30] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [18:32:33] scap shows that mw1259 is having problems [18:32:41] No route to host [18:32:45] All others are fine [18:32:55] (happened twice, for both syncs just now) [18:33:01] ok, looking [18:33:11] from tin [18:34:06] yea, crashed [18:34:14] garbled output on mgmt [18:34:18] rebooting [18:34:34] No alert about it? [18:34:51] !log mw1259 - powercycling [18:34:55] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [18:35:24] it's in Icinga with a comment and a ticket link [18:35:29] comment is "hardware problems" [18:35:48] T138921 [18:35:48] T138921: mw1103, mw1120 , mw1259 stuck after reboot - https://phabricator.wikimedia.org/T138921 [18:36:13] Last message in logstash was around midnight Jun 29 [18:36:39] yep. matches ticket [18:36:49] did not come back from reboot [18:37:04] error: seriallportw`com1'aisn'ttfound.rted. [18:37:05] Press any key to continue... [18:37:17] well.. it does something [18:38:06] it's started now [18:39:09] 06Operations, 10ops-eqiad, 13Patch-For-Review: mw1103, mw1120 , mw1259 stuck after reboot - https://phabricator.wikimedia.org/T138921#2414050 (10Dzahn) mw1259 was reported as down and garbled output on mgmt. i powercycled.. then saw this ticket from the icinga comment.. mw1259 came back up .. for now [18:39:37] (03PS3) 10Dzahn: mariadb: update submodule for lint-ignore, the right way [puppet] - 10https://gerrit.wikimedia.org/r/296949 [18:40:48] (03CR) 10Dzahn: [C: 032] mariadb: update submodule for lint-ignore, the right way [puppet] - 10https://gerrit.wikimedia.org/r/296949 (owner: 10Dzahn) [18:41:25] PROBLEM - DPKG on ytterbium is CRITICAL: DPKG CRITICAL dpkg reports broken packages [18:41:34] (03PS1) 10Chad: Gerrit: move nasty ssh key to crons class, only user [puppet] - 10https://gerrit.wikimedia.org/r/296951 [18:57:16] (03PS1) 10Dzahn: puppet-lint.rc: remove exception for puppet URLs without modules [puppet] - 10https://gerrit.wikimedia.org/r/296954 (https://phabricator.wikimedia.org/T93645) [19:00:10] Hello, i used to have access to hafnium but i can no longer log in. should i be proxying still through bast1001.wikimedia.org? [19:00:38] nuria_: yes, or bast2001, 3001 or 4001 should work as well [19:00:57] mutante: k, i guess i no longer have access [19:01:12] i can confirm your user is still on hafnium [19:01:49] but Failed publickey for nuria [19:01:55] in logs [19:02:14] RECOVERY - DPKG on ytterbium is OK: All packages OK [19:02:38] yea, hmm, it looks like there is no ssh key [19:02:41] i dont know why [19:03:49] hmm [19:04:40] maybe this? This patch _actually_ completes the migration of XHGui from hafnium to [19:04:43] tungsten. [19:06:26] nuria_: the group "perf-roots" is on it only. but i dont know the context [19:07:03] ori: do you know ? [19:07:40] analytics should have access [19:07:45] if we can add that group, that would be great [19:07:58] but not aware of a recent change? [19:08:55] not off the top of my head [19:12:47] mutante: adding analytics will be great as we used to have access but it was specified on 1 by 1 basis [19:12:52] mutante: not as a group [19:13:10] i administer hafnium and it's fine by me (i can also approve as manager) [19:16:30] !log restarted navtiming on hafnium; stopped receiving messages from EL 0mq publisher [19:16:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [19:17:50] !log restarted coal on graphite1001 stopped receiving messages from EL 0mq publisher [19:17:54] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [19:20:36] (03CR) 10Ori.livneh: [C: 031] use SO_REUSEPORT in code, not SO_REUSEADDR [software/statsdlb] - 10https://gerrit.wikimedia.org/r/296920 (https://phabricator.wikimedia.org/T126447) (owner: 10Filippo Giunchedi) [19:28:26] ori: I see navtiming.py is also a subscriber to zeromq [19:30:09] 06Operations, 10Ops-Access-Requests: access: eventlogging-admins -> hafnium - https://phabricator.wikimedia.org/T139202#2422507 (10Dzahn) [19:31:05] 06Operations, 10Mobile-Content-Service, 10ORES, 06Services: Investigate increased memory pressure on scb1001/2 - https://phabricator.wikimedia.org/T139177#2422520 (10bearND) [19:31:07] mutante, hmm. when two hiera files conflict on admin::groups, what happens? [19:31:52] 06Operations, 10Ops-Access-Requests: access: eventlogging-admins -> hafnium - https://phabricator.wikimedia.org/T139202#2422523 (10Dzahn) for example there is T96164 which gave access to mforns on this in the past [19:32:48] Krenair: i dont know, the first one wins? [19:33:03] which is the first? [19:33:04] 06Operations, 10Mobile-Content-Service, 10ORES, 06Services: Investigate increased memory pressure on scb1001/2 - https://phabricator.wikimedia.org/T139177#2421697 (10bearND) Added Ops to bring this to their attention. I hope we can find a way to move the ORES service or the other RB services to separate pr... [19:34:20] Krenair: something like: [19:34:25] https://wikitech.wikimedia.org/wiki/Puppet_Hiera#In_Labs [19:34:32] the "resolution order" part there [19:35:13] first by hostname.. then regex [19:35:22] later role [19:35:26] afaict [19:54:54] 06Operations, 10Ops-Access-Requests: access: eventlogging-admins -> hafnium - https://phabricator.wikimedia.org/T139202#2422554 (10Dzahn) so the task is to restore that access.. also: 12:13 < ori> if we can add that group, that would be great 12:18 < ori> i administer hafnium and it's fine by me (i can also... [20:02:21] !bash milimetric: it's the best lib for screen scraping, but that's like being the best tool to have when you have been trapped under an avalanche of garbage [20:02:21] ori: Stored quip at https://tools.wmflabs.org/bash/quip/AVWoDg8ZgCrwkbTdmcjL [20:03:40] 06Operations, 10Ops-Access-Requests: access: eventlogging-admins -> hafnium - https://phabricator.wikimedia.org/T139202#2422507 (10ori) Approved, yes. [20:04:57] (03PS1) 10Ori.livneh: Bump $wgResourceLoaderMaxQueryLength to 4,000 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/296963 [20:05:04] PROBLEM - Disk space on elastic1020 is CRITICAL: DISK CRITICAL - free space: /var/lib/elasticsearch 80531 MB (15% inode=99%) [20:05:19] (03PS2) 10Ori.livneh: Bump $wgResourceLoaderMaxQueryLength to 4,000 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/296963 [20:05:38] (03CR) 10Ori.livneh: [C: 032] Bump $wgResourceLoaderMaxQueryLength to 4,000 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/296963 (owner: 10Ori.livneh) [20:06:12] (03PS2) 10Dzahn: puppet-lint: remove exception for puppet URLs without modules [puppet] - 10https://gerrit.wikimedia.org/r/296954 (https://phabricator.wikimedia.org/T93645) [20:06:28] (03Merged) 10jenkins-bot: Bump $wgResourceLoaderMaxQueryLength to 4,000 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/296963 (owner: 10Ori.livneh) [20:06:45] (03CR) 10Dzahn: [C: 032] "operations-puppet-puppetlint-strict SUCCESS in 1m 01s proofs they are all gone (or ignored)" [puppet] - 10https://gerrit.wikimedia.org/r/296954 (https://phabricator.wikimedia.org/T93645) (owner: 10Dzahn) [20:08:16] !log ori@tin Synchronized wmf-config/CommonSettings.php: I6eb0ae67: Bump $wgResourceLoaderMaxQueryLength to 4,000 (duration: 00m 26s) [20:08:20] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [20:09:22] (03PS3) 10Merlijn van Deen: [DO NOT SUBMIT] test for tool labs puppet compiler [puppet] - 10https://gerrit.wikimedia.org/r/254183 [20:11:53] PROBLEM - Disk space on elastic1020 is CRITICAL: DISK CRITICAL - free space: /var/lib/elasticsearch 79416 MB (15% inode=99%) [20:25:59] * ebernhardson sighs at elasticsearch's random allocation algorithm.... [20:28:43] PROBLEM - Ulsfo HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 10.00% of data above the critical threshold [1000.0] [20:32:35] (03PS1) 10Hashar: (WIP) rspec config for the whole repository (WIP) [puppet] - 10https://gerrit.wikimedia.org/r/296965 (https://phabricator.wikimedia.org/T78342) [20:34:24] PROBLEM - Text HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 11.11% of data above the critical threshold [1000.0] [20:34:41] 06Operations, 10Continuous-Integration-Infrastructure, 13Patch-For-Review: Create a basic RSpec unit test for operations/puppet - https://phabricator.wikimedia.org/T78342#2422635 (10hashar) https://gerrit.wikimedia.org/r/296965 is an attempt to let us run `rspec` from the root of the repo. That terribly fai... [20:44:58] (03CR) 10Nuria: [C: 031] explicitly specify identity for EL legacy zmq forwarder [puppet] - 10https://gerrit.wikimedia.org/r/296959 (owner: 10Ori.livneh) [20:45:33] PROBLEM - Disk space on elastic1019 is CRITICAL: DISK CRITICAL - free space: /var/lib/elasticsearch 79313 MB (15% inode=99%) [20:45:54] RECOVERY - Text HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [20:47:13] RECOVERY - Ulsfo HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [20:53:24] (03PS3) 10Brian Wolff: Add Content-Security-Policy to images from test[2]wiki [puppet] - 10https://gerrit.wikimedia.org/r/296634 (https://phabricator.wikimedia.org/T117618) [20:54:21] (03CR) 10Brian Wolff: "That's embarrassing. Im going to work on setting up varnish locally (probably not anything like wmf's set up though) so I can actually tes" [puppet] - 10https://gerrit.wikimedia.org/r/296634 (https://phabricator.wikimedia.org/T117618) (owner: 10Brian Wolff) [20:57:44] PROBLEM - Disk space on elastic1020 is CRITICAL: DISK CRITICAL - free space: /var/lib/elasticsearch 80412 MB (15% inode=99%) [21:01:07] (03PS1) 10Ori.livneh: Bump $wgResourceLoaderMaxQueryLength to 5,000 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/296975 [21:01:36] (03CR) 10Ori.livneh: [C: 032] Bump $wgResourceLoaderMaxQueryLength to 5,000 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/296975 (owner: 10Ori.livneh) [21:02:23] (03Merged) 10jenkins-bot: Bump $wgResourceLoaderMaxQueryLength to 5,000 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/296975 (owner: 10Ori.livneh) [21:04:39] !log ori@tin Synchronized wmf-config/CommonSettings.php: I7a95c0f4: Bump $wgResourceLoaderMaxQueryLength to 5,000 (duration: 00m 32s) [21:04:46] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [21:09:39] 06Operations, 10Beta-Cluster-Infrastructure, 07HHVM: Convert work machines (tin, terbium) to Trusty and hhvm usage - https://phabricator.wikimedia.org/T87036#982188 (10Paladox) Should this be updated to use Jessie. [21:39:33] 06Operations, 10Ops-Access-Requests: access: eventlogging-admins -> hafnium - https://phabricator.wikimedia.org/T139202#2422783 (10Dzahn) this might be the cause https://gerrit.wikimedia.org/r/#/c/221634/ [21:40:09] (03CR) 10Dzahn: "did this cause https://phabricator.wikimedia.org/T139202 ?" [puppet] - 10https://gerrit.wikimedia.org/r/221634 (owner: 10Ottomata) [21:43:22] PROBLEM - Disk space on elastic1019 is CRITICAL: DISK CRITICAL - free space: /var/lib/elasticsearch 77087 MB (15% inode=99%) [21:43:33] I'm locked out of a mailman list that I administer. What's the Right(TM) way to reset the password? [21:43:35] FYI: I'm ahalfake [21:43:39] https://lists.wikimedia.org/mailman/admin/wiki-research-l [21:43:42] *ahalfaker [21:46:12] RECOVERY - Disk space on elastic1020 is OK: DISK OK [21:47:31] (03PS2) 10Chad: WIP: Gerrit: Setup rsync between old and new machines [puppet] - 10https://gerrit.wikimedia.org/r/296957 [21:47:56] (03CR) 10Chad: "Rebase since it didn't depend on that parent" [puppet] - 10https://gerrit.wikimedia.org/r/296957 (owner: 10Chad) [21:48:22] halfak: a) you could ask dtaraborelli for the password [21:48:55] b) perhaps see if you have it in an email from November 2015 [21:49:17] c) open a phabricator task [21:49:42] b wins! [21:49:45] Thanks Platonides :) [21:49:57] * halfak updates his password manager [21:53:07] halfak: fyi, I use listadmin, which has a .ini config file with list addresses and admin passwords. purpose is to quickly and via the cli approve/discard held messages (spam, non-subscribed senders, etc). Makes it easy. [21:54:46] halfak: eg https://phabricator.wikimedia.org/P3327 [21:55:30] greg-g, that's pretty nice. [21:55:54] This thing https://github.com/trentm/listadmin ? [21:56:14] apt-get install listadmin [21:56:19] oh! Even better [21:56:22] :) [21:56:52] RECOVERY - Disk space on elastic1019 is OK: DISK OK [22:00:52] PROBLEM - mobileapps endpoints health on scb1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [22:03:13] RECOVERY - mobileapps endpoints health on scb1001 is OK: All endpoints are healthy [22:12:33] (03PS1) 10ArielGlenn: fix flow full history job call for xml dumps [dumps] - 10https://gerrit.wikimedia.org/r/297104 [22:14:03] (03CR) 10ArielGlenn: [C: 032] fix flow full history job call for xml dumps [dumps] - 10https://gerrit.wikimedia.org/r/297104 (owner: 10ArielGlenn) [22:14:04] halfak: on ores.wmflabs.org , do we expect http://oresweb/node/ores-web-03 to exist ? looks 404 [22:22:24] !log krinkle@tin Synchronized php-1.28.0-wmf.8/extensions/WikimediaEvents/modules/: T128115 (duration: 00m 30s) [22:22:25] T128115: Drop support for ES3 javascript browsers in MediaWiki - https://phabricator.wikimedia.org/T128115 [22:22:28] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [22:23:04] !log krinkle@tin Synchronized php-1.28.0-wmf.8/extensions/WikimediaEvents/extension.json: T128115 (duration: 00m 37s) [22:23:04] T128115: Drop support for ES3 javascript browsers in MediaWiki - https://phabricator.wikimedia.org/T128115 [22:23:07] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log, Master [22:34:05] 06Operations, 10Mobile-Content-Service, 10ORES, 06Services: Investigate increased memory pressure on scb1001/2 - https://phabricator.wikimedia.org/T139177#2421697 (10Ladsgroup) We can reduce number of web workers to 3/4 for now (by changing https://github.com/wikimedia/operations-puppet/blob/production/mod... [23:00:35] (03PS2) 10Ori.livneh: explicitly specify identity for EL legacy zmq forwarder [puppet] - 10https://gerrit.wikimedia.org/r/296959 [23:00:51] (03CR) 10Ori.livneh: [C: 032 V: 032] explicitly specify identity for EL legacy zmq forwarder [puppet] - 10https://gerrit.wikimedia.org/r/296959 (owner: 10Ori.livneh) [23:09:01] 06Operations, 10Ops-Access-Requests: New SSH key for AGreen - https://phabricator.wikimedia.org/T139213#2422989 (10awight) [23:39:20] (03PS3) 10Dzahn: icinga/mariadb: move plugin into module [puppet/mariadb] - 10https://gerrit.wikimedia.org/r/296921 [23:39:56] (03PS4) 10Dzahn: icinga/mariadb: move plugin into module [puppet/mariadb] - 10https://gerrit.wikimedia.org/r/296921