[00:00:11] (03CR) 10Reedy: [C: 032] Enable MediaWiki.AlternativeSyntax.AlternativeSyntax.AlternativeSyntax [mediawiki-config] - 10https://gerrit.wikimedia.org/r/366762 (owner: 10Reedy) [00:00:55] (03Merged) 10jenkins-bot: Enable Squiz.WhiteSpace.LanguageConstructSpacing.Incorrect and Squiz.WhiteSpace.LanguageConstructSpacing.IncorrectSingle [mediawiki-config] - 10https://gerrit.wikimedia.org/r/366761 (owner: 10Reedy) [00:01:41] (03Merged) 10jenkins-bot: Enable MediaWiki.AlternativeSyntax.AlternativeSyntax.AlternativeSyntax [mediawiki-config] - 10https://gerrit.wikimedia.org/r/366762 (owner: 10Reedy) [00:03:38] (03PS1) 10Reedy: Enable Generic.Formatting.DisallowMultipleStatements.SameLine [mediawiki-config] - 10https://gerrit.wikimedia.org/r/366764 [00:04:45] (03CR) 10Reedy: [C: 032] Enable Generic.Formatting.DisallowMultipleStatements.SameLine [mediawiki-config] - 10https://gerrit.wikimedia.org/r/366764 (owner: 10Reedy) [00:05:39] !log reedy@tin Synchronized wmf-config/missing.php: phpcs (duration: 00m 43s) [00:05:48] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [00:09:10] (03Merged) 10jenkins-bot: Enable Generic.Formatting.DisallowMultipleStatements.SameLine [mediawiki-config] - 10https://gerrit.wikimedia.org/r/366764 (owner: 10Reedy) [00:09:23] !log reedy@tin Synchronized docroot/: phpcs (duration: 00m 44s) [00:09:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [00:10:42] (03PS1) 10Reedy: Enable Generic.CodeAnalysis.UnconditionalIfStatement.Found and Generic.Files.EndFileNewline.NotFound [mediawiki-config] - 10https://gerrit.wikimedia.org/r/366765 [00:11:14] !log reedy@tin Synchronized phpcs.xml: phpcs (duration: 00m 43s) [00:11:22] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [00:17:16] (03CR) 10Reedy: [C: 032] Enable Generic.CodeAnalysis.UnconditionalIfStatement.Found and Generic.Files.EndFileNewline.NotFound [mediawiki-config] - 10https://gerrit.wikimedia.org/r/366765 (owner: 10Reedy) [00:18:37] (03Merged) 10jenkins-bot: Enable Generic.CodeAnalysis.UnconditionalIfStatement.Found and Generic.Files.EndFileNewline.NotFound [mediawiki-config] - 10https://gerrit.wikimedia.org/r/366765 (owner: 10Reedy) [00:21:27] !log reedy@tin Synchronized docroot/noc/db.php: phpcs (duration: 00m 43s) [00:21:37] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [00:22:31] !log reedy@tin Synchronized wmf-config/wikitech.php: phpcs (duration: 00m 43s) [00:22:40] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [00:27:12] (03PS1) 10Reedy: Enable Generic.Formatting.MultipleStatementAlignment.IncorrectWarning and PSR2.Classes.PropertyDeclaration.ScopeMissing [mediawiki-config] - 10https://gerrit.wikimedia.org/r/366767 [00:27:59] (03CR) 10Reedy: [C: 032] Enable Generic.Formatting.MultipleStatementAlignment.IncorrectWarning and PSR2.Classes.PropertyDeclaration.ScopeMissing [mediawiki-config] - 10https://gerrit.wikimedia.org/r/366767 (owner: 10Reedy) [00:29:41] (03Merged) 10jenkins-bot: Enable Generic.Formatting.MultipleStatementAlignment.IncorrectWarning and PSR2.Classes.PropertyDeclaration.ScopeMissing [mediawiki-config] - 10https://gerrit.wikimedia.org/r/366767 (owner: 10Reedy) [00:33:00] (03Draft1) 10Paladox: Gerrit: Make ldap servers configuable [puppet] - 10https://gerrit.wikimedia.org/r/366768 [00:33:04] (03PS2) 10Paladox: Gerrit: Make ldap servers configuable [puppet] - 10https://gerrit.wikimedia.org/r/366768 [00:34:01] (03CR) 10jerkins-bot: [V: 04-1] Gerrit: Make ldap servers configuable [puppet] - 10https://gerrit.wikimedia.org/r/366768 (owner: 10Paladox) [00:37:02] (03PS3) 10Paladox: Gerrit: Make ldap servers configuable [puppet] - 10https://gerrit.wikimedia.org/r/366768 [00:39:51] (03PS1) 10Reedy: Various array() to [] [mediawiki-config] - 10https://gerrit.wikimedia.org/r/366769 [00:41:25] (03CR) 10Reedy: [C: 032] Various array() to [] [mediawiki-config] - 10https://gerrit.wikimedia.org/r/366769 (owner: 10Reedy) [00:43:06] (03Merged) 10jenkins-bot: Various array() to [] [mediawiki-config] - 10https://gerrit.wikimedia.org/r/366769 (owner: 10Reedy) [00:43:56] (03PS4) 10Paladox: Gerrit: Make ldap servers configuable [puppet] - 10https://gerrit.wikimedia.org/r/366768 [00:44:20] !log reedy@tin Synchronized tests/: phpcs (duration: 00m 43s) [00:44:29] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [00:44:55] (03CR) 10jerkins-bot: [V: 04-1] Gerrit: Make ldap servers configuable [puppet] - 10https://gerrit.wikimedia.org/r/366768 (owner: 10Paladox) [00:45:15] sorry for spam [00:45:38] !log reedy@tin Synchronized wmf-config/: phpcs (duration: 00m 44s) [00:45:49] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [00:46:33] !log reedy@tin Synchronized w: phpcs (duration: 00m 43s) [00:46:43] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [00:47:33] !log reedy@tin Synchronized rpc/RunJobs.php: phpcs (duration: 00m 43s) [00:47:42] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [00:47:48] (03PS5) 10Paladox: Gerrit: Make ldap servers configuable [puppet] - 10https://gerrit.wikimedia.org/r/366768 [00:48:09] (03PS6) 10Paladox: Gerrit: Make ldap servers configuable [puppet] - 10https://gerrit.wikimedia.org/r/366768 [00:48:29] !log reedy@tin Synchronized search-redirect.php: phpcs (duration: 00m 43s) [00:48:38] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [01:21:29] (03PS1) 10Reedy: Function comments, parameters and stuffs [mediawiki-config] - 10https://gerrit.wikimedia.org/r/366771 [01:21:59] (03CR) 10jerkins-bot: [V: 04-1] Function comments, parameters and stuffs [mediawiki-config] - 10https://gerrit.wikimedia.org/r/366771 (owner: 10Reedy) [01:22:07] (03PS2) 10Reedy: Function comments, parameters and stuffs [mediawiki-config] - 10https://gerrit.wikimedia.org/r/366771 [01:26:13] (03PS7) 10Paladox: Gerrit: Make ldap servers configuable [puppet] - 10https://gerrit.wikimedia.org/r/366768 [01:28:12] (03PS8) 10Paladox: Gerrit: Make ldap servers configuable [puppet] - 10https://gerrit.wikimedia.org/r/366768 [01:28:50] (03PS9) 10Paladox: Gerrit: Make ldap servers configuable [puppet] - 10https://gerrit.wikimedia.org/r/366768 [01:52:48] (03PS1) 10Reedy: Mostly re-enable Generic.Arrays.DisallowLongArraySyntax.Found [mediawiki-config] - 10https://gerrit.wikimedia.org/r/366776 [01:53:04] (03PS2) 10Reedy: Mostly re-enable Generic.Arrays.DisallowLongArraySyntax.Found [mediawiki-config] - 10https://gerrit.wikimedia.org/r/366776 [01:56:59] 10Operations, 10Ops-Access-Requests: Requesting access to tools.speedydeletionwikia for Dylann1024 (Nathan Larson) - https://phabricator.wikimedia.org/T171130#3459117 (10Mdupont) 05stalled>03Invalid [02:06:13] 10Operations, 10Ops-Access-Requests: Requesting access to tools.speedydeletionwikia for Dylann1024 (Nathan Larson) - https://phabricator.wikimedia.org/T171130#3459121 (10Mdupont) hi , I dont know this person of if he/she is who they claim to be , but they offered to help with the sd project. I could not add th... [04:03:10] PROBLEM - Router interfaces on cr1-eqiad is CRITICAL: CRITICAL: host 208.80.154.196, interfaces up: 232, down: 1, dormant: 0, excluded: 0, unused: 0 [04:17:30] PROBLEM - mailman I/O stats on fermium is CRITICAL: CRITICAL - I/O stats: Transfers/Sec=800.10 Read Requests/Sec=483.70 Write Requests/Sec=24.90 KBytes Read/Sec=53385.60 KBytes_Written/Sec=186.40 [04:23:58] (03CR) 10Krinkle: [C: 031] Mostly re-enable Generic.Arrays.DisallowLongArraySyntax.Found [mediawiki-config] - 10https://gerrit.wikimedia.org/r/366776 (owner: 10Reedy) [04:24:44] (03CR) 10Krinkle: Function comments, parameters and stuffs (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/366771 (owner: 10Reedy) [04:25:17] (03CR) 10Krinkle: Function comments, parameters and stuffs (032 comments) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/366771 (owner: 10Reedy) [04:26:30] RECOVERY - mailman I/O stats on fermium is OK: OK - I/O stats: Transfers/Sec=1.10 Read Requests/Sec=0.60 Write Requests/Sec=0.50 KBytes Read/Sec=31.20 KBytes_Written/Sec=8.40 [05:07:49] (03PS1) 10Marostegui: Revert "db-eqiad.php: Depool db1051" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/366787 [05:07:58] (03CR) 10jerkins-bot: [V: 04-1] Revert "db-eqiad.php: Depool db1051" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/366787 (owner: 10Marostegui) [05:43:51] (03PS1) 10Smalyshev: Enable Cirrus search of wbsearchentities when using useCirrus=1 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/366788 (https://phabricator.wikimedia.org/T125500) [05:45:09] (03PS2) 10Smalyshev: Enable Cirrus search of wbsearchentities when using useCirrus=1 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/366788 (https://phabricator.wikimedia.org/T125500) [05:45:20] RECOVERY - Router interfaces on cr1-eqiad is OK: OK: host 208.80.154.196, interfaces up: 234, down: 0, dormant: 0, excluded: 0, unused: 0 [06:02:45] 10Operations, 10MediaWiki-Platform-Team, 10monitoring: High levels of PoolCounter errors should trigger alerts - https://phabricator.wikimedia.org/T133318#2228045 (10tstarling) MW already provides a log of all PoolCounter errors, including queue overflow, in the poolcounter channel. So this is presumably jus... [06:03:50] PROBLEM - BGP status on cr1-eqiad is CRITICAL: BGP CRITICAL - AS2914/IPv6: Active, AS2914/IPv4: Active [06:20:51] RECOVERY - BGP status on cr1-eqiad is OK: BGP OK - up: 27, down: 0, shutdown: 2 [06:22:50] PROBLEM - Router interfaces on cr1-eqiad is CRITICAL: CRITICAL: host 208.80.154.196, interfaces up: 232, down: 1, dormant: 0, excluded: 0, unused: 0 [06:26:50] RECOVERY - Router interfaces on cr1-eqiad is OK: OK: host 208.80.154.196, interfaces up: 234, down: 0, dormant: 0, excluded: 0, unused: 0 [06:29:00] PROBLEM - BGP status on cr1-eqiad is CRITICAL: BGP CRITICAL - AS2914/IPv4: Active, AS2914/IPv6: Active [06:41:28] (03Abandoned) 10Marostegui: Revert "db-eqiad.php: Depool db1051" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/366787 (owner: 10Marostegui) [06:43:32] (03PS1) 10Marostegui: db-eqiad.php: Repool db1051 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/366791 (https://phabricator.wikimedia.org/T166204) [06:44:51] (03CR) 10jerkins-bot: [V: 04-1] db-eqiad.php: Repool db1051 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/366791 (https://phabricator.wikimedia.org/T166204) (owner: 10Marostegui) [06:46:18] (03PS2) 10Marostegui: db-eqiad.php: Repool db1051 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/366791 (https://phabricator.wikimedia.org/T166204) [06:47:41] (03CR) 10jerkins-bot: [V: 04-1] db-eqiad.php: Repool db1051 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/366791 (https://phabricator.wikimedia.org/T166204) (owner: 10Marostegui) [06:48:30] meh [06:51:13] (03PS3) 10Marostegui: db-eqiad.php: Repool db1051 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/366791 (https://phabricator.wikimedia.org/T166204) [07:07:10] RECOVERY - BGP status on cr1-eqiad is OK: BGP OK - up: 29, down: 0, shutdown: 0 [07:22:58] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Repool db1051 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/366791 (https://phabricator.wikimedia.org/T166204) (owner: 10Marostegui) [07:24:24] (03Merged) 10jenkins-bot: db-eqiad.php: Repool db1051 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/366791 (https://phabricator.wikimedia.org/T166204) (owner: 10Marostegui) [07:25:20] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Repool db1051 - T166204 (duration: 00m 44s) [07:25:31] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:25:32] T166204: Convert unique keys into primary keys for some wiki tables on s1 - https://phabricator.wikimedia.org/T166204 [07:39:18] (03PS1) 10Marostegui: db-eqiad.php: Fix some indents [mediawiki-config] - 10https://gerrit.wikimedia.org/r/366795 [07:40:37] (03CR) 10jerkins-bot: [V: 04-1] db-eqiad.php: Fix some indents [mediawiki-config] - 10https://gerrit.wikimedia.org/r/366795 (owner: 10Marostegui) [07:41:06] (03PS2) 10Marostegui: db-eqiad.php: Fix some indents [mediawiki-config] - 10https://gerrit.wikimedia.org/r/366795 [07:42:27] (03CR) 10jerkins-bot: [V: 04-1] db-eqiad.php: Fix some indents [mediawiki-config] - 10https://gerrit.wikimedia.org/r/366795 (owner: 10Marostegui) [07:44:11] PROBLEM - IPv4 ping to eqiad on ripe-atlas-eqiad is CRITICAL: CRITICAL - failed 35 probes of 275 (alerts on 19) - https://atlas.ripe.net/measurements/1790945/#!map [07:44:14] Ah right, I see there were some changes made yesterday and now jenkins is super picky...great [07:49:20] RECOVERY - IPv4 ping to eqiad on ripe-atlas-eqiad is OK: OK - failed 2 probes of 275 (alerts on 19) - https://atlas.ripe.net/measurements/1790945/#!map [07:57:16] 10Operations, 10Wikidata, 10Wikidata-Sprint-2016-11-08: [Task] move wikiba.se webhosting to wikimedia misc-cluster - https://phabricator.wikimedia.org/T99531#3459430 (10Ladsgroup) [07:59:03] paravoid: hey, should I make a phab card for legal review of wikiba.se? [07:59:22] (03PS3) 10Marostegui: db-eqiad.php: Fix some indents [mediawiki-config] - 10https://gerrit.wikimedia.org/r/366795 [08:00:47] (03CR) 10jerkins-bot: [V: 04-1] db-eqiad.php: Fix some indents [mediawiki-config] - 10https://gerrit.wikimedia.org/r/366795 (owner: 10Marostegui) [08:01:52] (03PS4) 10Marostegui: db-eqiad.php: Fix some indents [mediawiki-config] - 10https://gerrit.wikimedia.org/r/366795 [08:03:09] (03CR) 10jerkins-bot: [V: 04-1] db-eqiad.php: Fix some indents [mediawiki-config] - 10https://gerrit.wikimedia.org/r/366795 (owner: 10Marostegui) [08:04:17] (03PS5) 10Marostegui: db-eqiad.php: Fix some indents [mediawiki-config] - 10https://gerrit.wikimedia.org/r/366795 [08:06:53] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Fix some indents [mediawiki-config] - 10https://gerrit.wikimedia.org/r/366795 (owner: 10Marostegui) [08:08:13] (03Merged) 10jenkins-bot: db-eqiad.php: Fix some indents [mediawiki-config] - 10https://gerrit.wikimedia.org/r/366795 (owner: 10Marostegui) [08:09:17] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Fix some indents (duration: 00m 43s) [08:09:27] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:09:57] Amir1: they don't generally use phab for their issue tracking, so I guess not [08:10:11] yeah, you're right [08:12:02] 10Operations, 10ops-codfw: ms-be2024 not powering on - https://phabricator.wikimedia.org/T171275#3459460 (10fgiunchedi) [08:12:25] Also ops can merge this simple patch? https://gerrit.wikimedia.org/r/#/c/360891/ It's only for beta cluster and already cherry picked there and works just fine [08:12:27] sigh, ms-be2024 seems dead in the water ^ [08:14:27] Amir1: I'll merge it, I guess it slipped puppet swat yesterday? [08:14:44] I forgot to add it and today is Friday :( [08:14:45] (03PS4) 10Filippo Giunchedi: Add /data/ url redirect in beta cluster (Wikipedia only) [puppet] - 10https://gerrit.wikimedia.org/r/360891 (https://phabricator.wikimedia.org/T163922) (owner: 10Ladsgroup) [08:14:45] sorry [08:14:52] Thanks [08:15:23] np, beta-only is acceptable to me on a Fri too :) [08:15:54] +1 [08:16:06] (03CR) 10Filippo Giunchedi: [C: 032] Add /data/ url redirect in beta cluster (Wikipedia only) [puppet] - 10https://gerrit.wikimedia.org/r/360891 (https://phabricator.wikimedia.org/T163922) (owner: 10Ladsgroup) [08:16:17] Amir1: I left a comment on the other Rewrite patch, it seems that a "/" was missing? [08:16:27] or maybe I misunderstood the whole thing [08:16:45] andrewbogott: is your patch mergeable? [08:17:04] ac9b2e0 that is [08:17:28] elukey: yeah, I'm trying to get them in [08:21:01] sure sure whenever you have time, I saw the other code review and thought to ask :) [08:22:17] 10Operations, 10ops-eqiad: Degraded RAID on db1001 - https://phabricator.wikimedia.org/T171232#3459479 (10jcrespo) @Cmjohnson - you should have 300 GB old disks, but if you don't I can tell you were to get some (decommed/unused servers). This one is going to soon be retired, but right now is still in use. [08:25:40] PROBLEM - Unmerged changes on repository puppet on puppetmaster1001 is CRITICAL: There are 2 unmerged changes in puppet (dir /var/lib/git/operations/puppet, ref HEAD..origin/production). [08:27:34] * elukey blames godog :P [08:29:41] RECOVERY - Unmerged changes on repository puppet on puppetmaster1001 is OK: No changes to merge. [08:29:48] elukey: https://zippy.gfycat.com/OldfashionedAcrobaticHapuka.webm [08:31:18] godog: we can close the internet today [08:31:26] made my day thanks [08:31:51] hahaha you are welcome [08:31:54] (also the people in my co-working are telling me that you are a genius) [08:33:05] heheh I have a stash [08:39:54] (03PS1) 10Muehlenhoff: Extend account for akrausetud [puppet] - 10https://gerrit.wikimedia.org/r/366799 [08:42:34] (03PS2) 10Muehlenhoff: Extend account for akrausetud [puppet] - 10https://gerrit.wikimedia.org/r/366799 [08:44:05] !log stopping replication on db2072 to fix some duplicate key errors [08:44:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:46:02] (03CR) 10jenkins-bot: Enable WikibaseQualityConstraints statements [mediawiki-config] - 10https://gerrit.wikimedia.org/r/363200 (https://phabricator.wikimedia.org/T169647) (owner: 10Lucas Werkmeister (WMDE)) [08:46:08] (03CR) 10Muehlenhoff: [C: 032] Extend account for akrausetud [puppet] - 10https://gerrit.wikimedia.org/r/366799 (owner: 10Muehlenhoff) [08:49:21] (03CR) 10jenkins-bot: Configure WikibaseQualityConstraints extension [mediawiki-config] - 10https://gerrit.wikimedia.org/r/358553 (https://phabricator.wikimedia.org/T168938) (owner: 10Lucas Werkmeister (WMDE)) [08:49:37] (03CR) 10jenkins-bot: Enable 3 squiz phpcs rules [mediawiki-config] - 10https://gerrit.wikimedia.org/r/366756 (owner: 10Reedy) [08:59:28] 10Operations, 10Puppet, 10LDAP: Should puppet auto-restart slapd? - https://phabricator.wikimedia.org/T171191#3459545 (10MoritzMuehlenhoff) Changes to slapd configs are fairly rare, the benefit of manual restarts is that is provides full control to only restart one slapd at a time (both the LDAP setups for t... [09:03:04] (03PS1) 10Marostegui: db-eqiad.php: Fix indents [mediawiki-config] - 10https://gerrit.wikimedia.org/r/366804 [09:04:51] (03CR) 10jerkins-bot: [V: 04-1] db-eqiad.php: Fix indents [mediawiki-config] - 10https://gerrit.wikimedia.org/r/366804 (owner: 10Marostegui) [09:06:17] (03PS4) 10Ema: varnish cachestats.py: cache statsd server IP [puppet] - 10https://gerrit.wikimedia.org/r/366564 (https://phabricator.wikimedia.org/T151643) [09:07:03] (03CR) 10Jcrespo: "?" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/366804 (owner: 10Marostegui) [09:09:47] 10Operations, 10Cloud-Services: wikitech api action=query not returning list of instances - https://phabricator.wikimedia.org/T171280#3459555 (10fgiunchedi) [09:10:46] (03PS2) 10Marostegui: db-eqiad.php: Fix indents [mediawiki-config] - 10https://gerrit.wikimedia.org/r/366804 [09:11:10] 10Operations, 10Cloud-Services: wikitech api list=novainstances not returning list of instances - https://phabricator.wikimedia.org/T171280#3459570 (10fgiunchedi) [09:12:05] (03CR) 10Ema: varnish cachestats.py: cache statsd server IP (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/366564 (https://phabricator.wikimedia.org/T151643) (owner: 10Ema) [09:12:13] (03CR) 10jenkins-bot: Enable MediaWiki.WhiteSpace.SpaceBeforeControlStructureBrace.EmptyLines and MediaWiki.WhiteSpace.SpaceAfterControlStructure.Incorrect [mediawiki-config] - 10https://gerrit.wikimedia.org/r/366758 (owner: 10Reedy) [09:12:28] (03CR) 10jenkins-bot: Enable Squiz.WhiteSpace.LanguageConstructSpacing.Incorrect and Squiz.WhiteSpace.LanguageConstructSpacing.IncorrectSingle [mediawiki-config] - 10https://gerrit.wikimedia.org/r/366761 (owner: 10Reedy) [09:12:30] (03CR) 10jerkins-bot: [V: 04-1] db-eqiad.php: Fix indents [mediawiki-config] - 10https://gerrit.wikimedia.org/r/366804 (owner: 10Marostegui) [09:12:39] (03CR) 10jenkins-bot: Enable MediaWiki.AlternativeSyntax.AlternativeSyntax.AlternativeSyntax [mediawiki-config] - 10https://gerrit.wikimedia.org/r/366762 (owner: 10Reedy) [09:12:43] (03CR) 10jenkins-bot: Enable Generic.Formatting.DisallowMultipleStatements.SameLine [mediawiki-config] - 10https://gerrit.wikimedia.org/r/366764 (owner: 10Reedy) [09:12:50] (03CR) 10jenkins-bot: Enable Generic.Formatting.MultipleStatementAlignment.IncorrectWarning and PSR2.Classes.PropertyDeclaration.ScopeMissing [mediawiki-config] - 10https://gerrit.wikimedia.org/r/366767 (owner: 10Reedy) [09:12:58] (03CR) 10jenkins-bot: Various array() to [] [mediawiki-config] - 10https://gerrit.wikimedia.org/r/366769 (owner: 10Reedy) [09:13:21] (03PS3) 10Marostegui: db-eqiad.php: Fix indents [mediawiki-config] - 10https://gerrit.wikimedia.org/r/366804 [09:13:23] (03CR) 10jenkins-bot: db-eqiad.php: Repool db1051 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/366791 (https://phabricator.wikimedia.org/T166204) (owner: 10Marostegui) [09:13:48] (03CR) 10jenkins-bot: db-eqiad.php: Fix some indents [mediawiki-config] - 10https://gerrit.wikimedia.org/r/366795 (owner: 10Marostegui) [09:13:54] (03CR) 10jenkins-bot: Re-enable MediaWiki.WhiteSpace.SpaceyParenthesis.SingleSpaceAfterOpenParenthesis and MediaWiki.WhiteSpace.SpaceyParenthesis.SingleSpaceBeforeCloseParenthesis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/366750 (owner: 10Reedy) [09:14:05] (03CR) 10jenkins-bot: Add Author namespace on ta.wikisource [mediawiki-config] - 10https://gerrit.wikimedia.org/r/366626 (https://phabricator.wikimedia.org/T165813) (owner: 10Dereckson) [09:14:12] (03CR) 10jenkins-bot: Enable MediaWiki.WhiteSpace.SpaceyParenthesis.UnnecessarySpaceBetweenParentheses [mediawiki-config] - 10https://gerrit.wikimedia.org/r/366757 (owner: 10Reedy) [09:14:49] (03CR) 10jenkins-bot: Enable Generic.CodeAnalysis.UnconditionalIfStatement.Found and Generic.Files.EndFileNewline.NotFound [mediawiki-config] - 10https://gerrit.wikimedia.org/r/366765 (owner: 10Reedy) [09:15:02] (03CR) 10jenkins-bot: Enable MediaWiki.WhiteSpace.SpaceBeforeSingleLineComment.SingleSpaceBeforeSingleLineComment [mediawiki-config] - 10https://gerrit.wikimedia.org/r/366759 (owner: 10Reedy) [09:15:30] (03CR) 10jenkins-bot: Enable Generic.ControlStructures.InlineControlStructure.NotAllowed [mediawiki-config] - 10https://gerrit.wikimedia.org/r/366760 (owner: 10Reedy) [09:15:31] (03PS1) 10Filippo Giunchedi: Don't show diffs for files with secret content [puppet] - 10https://gerrit.wikimedia.org/r/366806 (https://phabricator.wikimedia.org/T79881) [09:17:15] (03CR) 10jerkins-bot: [V: 04-1] Don't show diffs for files with secret content [puppet] - 10https://gerrit.wikimedia.org/r/366806 (https://phabricator.wikimedia.org/T79881) (owner: 10Filippo Giunchedi) [09:19:43] (03PS1) 10Filippo Giunchedi: puppetmaster: stop serving private via fileserver [puppet] - 10https://gerrit.wikimedia.org/r/366808 (https://phabricator.wikimedia.org/T79881) [09:24:02] (03PS2) 10Muehlenhoff: Add cache::misc hosts to network constants [puppet] - 10https://gerrit.wikimedia.org/r/366526 [09:25:27] (03PS2) 10Filippo Giunchedi: Don't show diffs for files with secret content [puppet] - 10https://gerrit.wikimedia.org/r/366806 (https://phabricator.wikimedia.org/T79881) [09:25:29] (03PS2) 10Filippo Giunchedi: puppetmaster: stop serving private via fileserver [puppet] - 10https://gerrit.wikimedia.org/r/366808 (https://phabricator.wikimedia.org/T79881) [09:28:43] 10Operations, 10monitoring, 10User-fgiunchedi: update diamond to latest upstream version - https://phabricator.wikimedia.org/T97635#3459627 (10fgiunchedi) [09:28:58] 10Operations, 10Cassandra, 10Services (blocked), 10User-Joe, 10User-fgiunchedi: Hyperthreading disabled on restbase2002.codfw.wmnet & restbase1015.codfw.wmnet - https://phabricator.wikimedia.org/T162735#3459628 (10fgiunchedi) [09:37:01] ACKNOWLEDGEMENT - Host ms-be2024 is DOWN: PING CRITICAL - Packet loss = 100% Filippo Giunchedi https://phabricator.wikimedia.org/T171275 [09:42:23] !log add 100G to graphite2002/graphite1003 vgs [09:42:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:42:38] that was supposed to be lvs [09:42:59] 10Operations, 10Cloud-Services: wikitech api list=novainstances not returning list of instances - https://phabricator.wikimedia.org/T171280#3459555 (10hashar) Code is in api/ApiListNovaInstances.php. Replaying it on silver: ``` $ mwscript eval.php --wiki=labswiki > global $wgOpenStackManagerLDAPUsername; > glo... [09:45:39] (03CR) 10Muehlenhoff: [C: 032] Add cache::misc hosts to network constants [puppet] - 10https://gerrit.wikimedia.org/r/366526 (owner: 10Muehlenhoff) [09:46:43] 10Operations, 10Cloud-Services: wikitech api list=novainstances not returning list of instances - https://phabricator.wikimedia.org/T171280#3459705 (10hashar) And in the nova logs, I also see 401 for the tools project for requests from Silver "GET /v2/tools/servers/detail HTTP/1.1" status: 401 len: 291 [09:48:24] 10Operations, 10ORES, 10Scoring-platform-team-Backlog, 10Graphite, 10User-fgiunchedi: Regularly purge old ores graphite metrics - https://phabricator.wikimedia.org/T169969#3459711 (10fgiunchedi) >>! In T169969#3456586, @Halfak wrote: > I think we'd like to keep some high level metrics forever, others for... [09:50:31] hi [09:51:03] i need to update my ssh key https://gerrit.wikimedia.org/r/#/c/363180/ [09:52:15] aude: pro tip: you should be able to list several ssh keys there :] [09:52:57] i know [09:53:39] (03PS1) 10Muehlenhoff: oresweb: Switch to $CACHE_MISC [puppet] - 10https://gerrit.wikimedia.org/r/366811 [09:55:30] (03PS1) 10Muehlenhoff: Switch to $CACHE_MISC [puppet] - 10https://gerrit.wikimedia.org/r/366812 [09:56:56] 10Operations, 10Analytics, 10EventBus, 10Patch-For-Review, 10User-Elukey: Eventbus does not handle gracefully changes in DNS recursors - https://phabricator.wikimedia.org/T171048#3459727 (10elukey) Opened an issue with upstream: https://github.com/sprockets/sprockets.clients.statsd/issues/7 [10:00:32] (03PS1) 10Muehlenhoff: profile::microsites::annualreport: Switch to $CACHE_MISC [puppet] - 10https://gerrit.wikimedia.org/r/366815 [10:02:21] 10Operations, 10Wikimedia-Apache-configuration: https://test.wikipedia.org/wiki/Bug%3F?action=history doesn't show the history page, unlike https://test.wikipedia.org/w/index.php?title=Bug%3F&action=history - https://phabricator.wikimedia.org/T123276#3459741 (10fgiunchedi) p:05Triage>03Normal [10:02:57] (03PS1) 10Muehlenhoff: profile::microsites::endowment: Switch to $CACHE_MISC [puppet] - 10https://gerrit.wikimedia.org/r/366817 [10:04:06] 10Operations, 10ops-codfw: mc2023 / mc2025 fail to mount root partition within 90 seconds using Linux 4.9 - https://phabricator.wikimedia.org/T170152#3459748 (10fgiunchedi) p:05Triage>03Normal [10:04:19] 10Operations: Upload nodejs 6.x to stretch-wikimedia - https://phabricator.wikimedia.org/T169763#3459749 (10fgiunchedi) p:05Triage>03Normal [10:04:24] (03PS1) 10Muehlenhoff: profile::microsites::releases: Switch to $CACHE_MISC [puppet] - 10https://gerrit.wikimedia.org/r/366818 [10:05:30] 10Operations, 10Puppet: Use multiple puppetdbs on puppet masters - https://phabricator.wikimedia.org/T169318#3459765 (10fgiunchedi) p:05Triage>03Normal [10:06:18] (03PS1) 10Muehlenhoff: profile::microsites::static_bugzilla: Switch to $CACHE_MISC [puppet] - 10https://gerrit.wikimedia.org/r/366819 [10:07:05] 10Operations, 10ops-eqiad: Degraded RAID on db1001 - https://phabricator.wikimedia.org/T171232#3459767 (10fgiunchedi) p:05Triage>03Normal [10:08:02] (03PS1) 10Muehlenhoff: profile::microsites::transparency: Switch to $CACHE_MISC [puppet] - 10https://gerrit.wikimedia.org/r/366820 [10:08:05] 10Operations, 10ops-eqiad: Degraded RAID on db1001 - https://phabricator.wikimedia.org/T171232#3458257 (10Marostegui) Lovely, after I mentioned yesterday this host doesn't have any HW issues, a disk fails :-) I should have kept my mouth closed! [10:10:27] 10Operations, 10Puppet, 10LDAP: Should puppet auto-restart slapd? - https://phabricator.wikimedia.org/T171191#3459785 (10fgiunchedi) p:05Triage>03Normal re: changes of puppet restarting both nodes at the same time, see also {T161145} [10:12:05] (03PS2) 10Giuseppe Lavagetto: rake: new rakefile specifically for CI [puppet] - 10https://gerrit.wikimedia.org/r/366591 (https://phabricator.wikimedia.org/T166888) [10:12:47] <_joe_> hashar: ^^ would you take a look? [10:13:38] 10Operations, 10ops-eqiad: Degraded RAID on ms-be1016 - https://phabricator.wikimedia.org/T171183#3459799 (10fgiunchedi) p:05Triage>03Normal a:03Cmjohnson @Cmjohnson this seems similar to {T163777} `Cache: Permanently Disabled - Cable Error - Battery/Capacitor: Recharging` I suspect the machine will need... [10:14:15] <_joe_> I'd rename the file Rakefie.ci if we like it, my simple tests shoed a neat speedup [10:15:13] 10Operations, 10ops-eqiad, 10OCG-General, 10Reading-Web-Backlog (Tracking), 10User-Joe: ocg1001 is broken - https://phabricator.wikimedia.org/T170886#3459807 (10Joe) p:05Triage>03Normal a:03Joe [10:16:55] 10Operations, 10Fundraising-Backlog, 10fundraising-tech-ops: reports.frdev.wm.o -- still in use? - https://phabricator.wikimedia.org/T170640#3459810 (10fgiunchedi) p:05Triage>03Normal [10:17:06] 10Operations, 10Commons, 10Thumbor, 10Traffic, 10media-storage: ERR_RESPONSE_HEADERS_MULTIPLE_CONTENT_DISPOSITION - https://phabricator.wikimedia.org/T170605#3459811 (10fgiunchedi) p:05Triage>03Normal [10:17:48] 10Operations, 10LDAP-Access-Requests: Add "chrisneuroth" to wmde LDAP group - https://phabricator.wikimedia.org/T170552#3459825 (10fgiunchedi) p:05Triage>03Normal [10:18:15] 10Operations, 10Traffic: Non zero rated LVS IPs - https://phabricator.wikimedia.org/T170518#3459827 (10fgiunchedi) p:05Triage>03Normal [10:18:28] 10Operations, 10Operations-Software-Development: New tool to track package updates/status for hosts and images (debmonitor) - https://phabricator.wikimedia.org/T167504#3459828 (10fgiunchedi) p:05Triage>03Normal [10:19:05] 10Operations, 10monitoring: Monitoring: add link to graph for Icinga timeseries alarms - https://phabricator.wikimedia.org/T167422#3459829 (10fgiunchedi) p:05Triage>03Normal [10:20:00] 10Operations, 10Kubernetes, 10Prod-Kubernetes (Experiment), 10User-Joe: Make security updates of docker images manageable - https://phabricator.wikimedia.org/T167269#3459830 (10fgiunchedi) p:05Triage>03Normal [10:20:09] 10Operations, 10Collection, 10OfflineContentGenerator, 10Reading-Community-Engagement, and 2 others: Replace OCG in collection extension with Electron - https://phabricator.wikimedia.org/T150872#3459831 (10fgiunchedi) p:05Triage>03High [10:20:16] 10Operations, 10Cassandra, 10Services (blocked), 10User-Joe, 10User-fgiunchedi: Hyperthreading disabled on restbase2002.codfw.wmnet & restbase1015.codfw.wmnet - https://phabricator.wikimedia.org/T162735#3459832 (10fgiunchedi) p:05Triage>03Normal [10:20:38] 10Operations, 10Mail: Exim panics when spamd reaches maxchildren - https://phabricator.wikimedia.org/T166291#3459833 (10fgiunchedi) p:05Triage>03Normal [10:20:50] 10Operations, 10Puppet, 10Release-Engineering-Team (Watching / External): Integrate the puppet compiler in the puppet CI pipeline - https://phabricator.wikimedia.org/T166066#3459834 (10fgiunchedi) p:05Triage>03Normal [10:21:44] 10Operations, 10Cloud-Services, 10Security: labspuppetmaster security issues - https://phabricator.wikimedia.org/T171289#3459835 (10faidon) [10:28:59] moritzm, ema: I41081d6b04a9f0d40983dc168a8bb9954ce0c182 seems very manual, I worry that we'll keep forgetting to update the list of caches there [10:29:10] is there perhaps a hiera key we could reference instead? [10:29:31] cache::misc::nodes perhaps? [10:33:00] (03CR) 10Faidon Liambotis: [C: 031] "\o/" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/366808 (https://phabricator.wikimedia.org/T79881) (owner: 10Filippo Giunchedi) [10:35:48] (03PS1) 10Muehlenhoff: profile::piwik::webserver: Include role::prometheus::apache_exporter [puppet] - 10https://gerrit.wikimedia.org/r/366827 [10:36:50] (03PS3) 10Filippo Giunchedi: puppetmaster: stop serving private via fileserver [puppet] - 10https://gerrit.wikimedia.org/r/366808 (https://phabricator.wikimedia.org/T79881) [10:37:16] (03CR) 10Filippo Giunchedi: puppetmaster: stop serving private via fileserver (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/366808 (https://phabricator.wikimedia.org/T79881) (owner: 10Filippo Giunchedi) [10:39:57] 10Operations, 10Gerrit: move gerrit.wm.org SSH service to private/behind LVS like phab-vcs - https://phabricator.wikimedia.org/T165631#3272220 (10fgiunchedi) >>! In T165631#3272935, @demon wrote: > We can't move them behind LVS. Unlike Phabricator, which uses a separate hostname for the SSH service, Gerrit exp... [10:40:01] (03PS1) 10Muehlenhoff: profile::otrs: Include role::prometheus::apache_exporter [puppet] - 10https://gerrit.wikimedia.org/r/366829 [10:40:04] 10Operations, 10Gerrit: move gerrit.wm.org SSH service to private/behind LVS like phab-vcs - https://phabricator.wikimedia.org/T165631#3459874 (10fgiunchedi) p:05Triage>03Normal [10:41:05] 10Operations, 10Traffic, 10Wikimedia-General-or-Unknown, 10I18n: wikimediafoundation.org's language selector is confusing to most visitors who don't have accounts there - https://phabricator.wikimedia.org/T166782#3459878 (10fgiunchedi) p:05Triage>03Normal [10:42:13] (03PS11) 10Jcrespo: prometheus: Convert mysqld-exporter into multi-instance [puppet] - 10https://gerrit.wikimedia.org/r/364396 (https://phabricator.wikimedia.org/T170666) [10:42:27] (03PS1) 10Elukey: role::prometheus::apache_exporter: move to profiles [puppet] - 10https://gerrit.wikimedia.org/r/366830 [10:42:31] moritzm: --^ [10:42:45] makes sense? So we'll include a profile rather than the role [10:43:10] (03CR) 10jerkins-bot: [V: 04-1] prometheus: Convert mysqld-exporter into multi-instance [puppet] - 10https://gerrit.wikimedia.org/r/364396 (https://phabricator.wikimedia.org/T170666) (owner: 10Jcrespo) [10:44:33] 10Operations, 10Cloud-Services: wikitech api list=novainstances not returning list of instances - https://phabricator.wikimedia.org/T171280#3459889 (10fgiunchedi) There's also a related alert for `novaadmin has roles in every project` which I believe it is related, asking for instances in a project not listed... [10:45:31] (03CR) 10Ladsgroup: Add /data/ Redirect for commons (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/360887 (https://phabricator.wikimedia.org/T163922) (owner: 10Ladsgroup) [10:46:36] (03PS12) 10Jcrespo: prometheus: Convert mysqld-exporter into multi-instance [puppet] - 10https://gerrit.wikimedia.org/r/364396 (https://phabricator.wikimedia.org/T170666) [10:49:33] (03CR) 10Filippo Giunchedi: "LGTM, modulo what PCC has to say" [puppet] - 10https://gerrit.wikimedia.org/r/366827 (owner: 10Muehlenhoff) [10:50:17] paravoid: but that applies to any entry in constants.pp; I prefer that over adding further adding further places using @resolve(). but if ema and bblack prefer moving to cache::misc::nodes we can change that [10:50:31] we can do ipresolve() in puppet too [10:51:36] (03CR) 10Jcrespo: "https://puppet-compiler.wmflabs.org/compiler02/7122/dbstore2002.codfw.wmnet/ :" [puppet] - 10https://gerrit.wikimedia.org/r/364396 (https://phabricator.wikimedia.org/T170666) (owner: 10Jcrespo) [10:52:49] (03CR) 10Filippo Giunchedi: "See inline, I suspect some dsh groups can be read from puppet-generated ones?" (033 comments) [software/logstash-logback-encoder] - 10https://gerrit.wikimedia.org/r/366466 (https://phabricator.wikimedia.org/T116340) (owner: 10Mobrovac) [10:53:37] I'll check with them [10:54:02] elukey: sure, it's fairly unrelaed, but we can do that, having a look in a bit [10:54:38] (03CR) 10Filippo Giunchedi: Add the Scap configuration (031 comment) [software/cassandra-metrics-collector] - 10https://gerrit.wikimedia.org/r/366404 (https://phabricator.wikimedia.org/T137371) (owner: 10Mobrovac) [10:55:42] 10Operations, 10Commons, 10media-storage, 10monitoring: Monitor [[Special:ListFiles]] for non 200 HTTP statuses in thumbnails - https://phabricator.wikimedia.org/T106937#1482087 (10fgiunchedi) @chasemp how would we setup the check and gauge how much it costs? [10:55:43] !log oblivian@puppetmaster1001 conftool action : set/pooled=yes; selector: name=ocg1001.eqiad.wmnet [10:55:54] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:58:40] (03CR) 10Muehlenhoff: "PCC: http://puppet-compiler.wmflabs.org/7123/" [puppet] - 10https://gerrit.wikimedia.org/r/366827 (owner: 10Muehlenhoff) [11:01:00] (03PS13) 10Jcrespo: prometheus: Convert mysqld-exporter into multi-instance [puppet] - 10https://gerrit.wikimedia.org/r/364396 (https://phabricator.wikimedia.org/T170666) [11:04:40] (03CR) 10Daniel Kinzler: "I'm a bit unclear on how config changes are rolled out. This does not seem to be live, at least (03CR) 10Filippo Giunchedi: [C: 031] profile::piwik::webserver: Include role::prometheus::apache_exporter [puppet] - 10https://gerrit.wikimedia.org/r/366827 (owner: 10Muehlenhoff) [11:05:50] (03CR) 10Ladsgroup: "Daniel: This is beta cluster not test site, en.wikipedia.beta.org works just fine for me." [puppet] - 10https://gerrit.wikimedia.org/r/360891 (https://phabricator.wikimedia.org/T163922) (owner: 10Ladsgroup) [11:06:38] (03PS14) 10Jcrespo: prometheus: Convert mysqld-exporter into multi-instance [puppet] - 10https://gerrit.wikimedia.org/r/364396 (https://phabricator.wikimedia.org/T170666) [11:11:31] (03CR) 10Elukey: [C: 032] profile::piwik::webserver: Include role::prometheus::apache_exporter [puppet] - 10https://gerrit.wikimedia.org/r/366827 (owner: 10Muehlenhoff) [11:11:48] (03CR) 10Elukey: "argh sorry wrong one :)" [puppet] - 10https://gerrit.wikimedia.org/r/366827 (owner: 10Muehlenhoff) [11:12:25] (03PS15) 10Jcrespo: prometheus: Convert mysqld-exporter into multi-instance [puppet] - 10https://gerrit.wikimedia.org/r/364396 (https://phabricator.wikimedia.org/T170666) [11:13:11] moritzm: it is probably a OCD nit but I'd prefer to see a profile included in another one rather than a role [11:13:23] I can also remove the role completely [11:13:32] so we can only include the profile [11:15:13] 10Operations, 10Gerrit: move gerrit.wm.org SSH service to private/behind LVS like phab-vcs - https://phabricator.wikimedia.org/T165631#3460065 (10fgiunchedi) Also in a master/slave configuration are the ssh host keys exposed by gerrit the same on both machines? Only slightly related to lvs but it just occurred... [11:16:23] elukey: sure, but since the entire role/profile migration is in motion and will take quite a while to complete, such cases will inevitably happen (but it's also a chance to align them as they come ofc( [11:19:24] (03PS2) 10Elukey: role::prometheus::apache_exporter: move to profiles [puppet] - 10https://gerrit.wikimedia.org/r/366830 [11:25:35] forgot the videoscalers [11:25:37] bad luca [11:26:40] (03PS3) 10Elukey: role::prometheus::apache_exporter: move to profiles [puppet] - 10https://gerrit.wikimedia.org/r/366830 [11:28:02] (03PS1) 10Filippo Giunchedi: librenms: enable graphite extension [puppet] - 10https://gerrit.wikimedia.org/r/366836 (https://phabricator.wikimedia.org/T171167) [11:29:02] (03CR) 10jerkins-bot: [V: 04-1] librenms: enable graphite extension [puppet] - 10https://gerrit.wikimedia.org/r/366836 (https://phabricator.wikimedia.org/T171167) (owner: 10Filippo Giunchedi) [11:30:33] 10Operations, 10monitoring, 10netops, 10Patch-For-Review, 10User-fgiunchedi: Evaluate LibreNMS' Graphite backend - https://phabricator.wikimedia.org/T171167#3460129 (10fgiunchedi) Space wise, librenms has ~15k rrds now, assuming it'll create the same number on the graphite side, at ~350k per whisper file... [11:30:35] (03CR) 10Elukey: "pcc looks good: https://puppet-compiler.wmflabs.org/compiler02/7128/" [puppet] - 10https://gerrit.wikimedia.org/r/366830 (owner: 10Elukey) [11:31:32] (03PS2) 10Filippo Giunchedi: librenms: enable graphite extension [puppet] - 10https://gerrit.wikimedia.org/r/366836 (https://phabricator.wikimedia.org/T171167) [11:32:21] (03PS1) 10Reedy: Fix up some file indenting broken by my phpcs changes [mediawiki-config] - 10https://gerrit.wikimedia.org/r/366837 (https://phabricator.wikimedia.org/T171282) [11:33:55] (03PS1) 10Urbanecm: Activate DynamicPageList on dewikivoyage [mediawiki-config] - 10https://gerrit.wikimedia.org/r/366838 (https://phabricator.wikimedia.org/T171293) [11:34:18] (03PS1) 10Reedy: Move some trailing ] onto newlines to make more balanced [mediawiki-config] - 10https://gerrit.wikimedia.org/r/366839 [11:37:31] (03CR) 10Reedy: Function comments, parameters and stuffs (032 comments) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/366771 (owner: 10Reedy) [11:37:58] (03PS3) 10Filippo Giunchedi: librenms: enable graphite extension [puppet] - 10https://gerrit.wikimedia.org/r/366836 (https://phabricator.wikimedia.org/T171167) [11:42:46] (03CR) 10jerkins-bot: [V: 04-1] librenms: enable graphite extension [puppet] - 10https://gerrit.wikimedia.org/r/366836 (https://phabricator.wikimedia.org/T171167) (owner: 10Filippo Giunchedi) [11:44:36] (03CR) 10Jcrespo: [C: 031] Fix up some file indenting broken by my phpcs changes [mediawiki-config] - 10https://gerrit.wikimedia.org/r/366837 (https://phabricator.wikimedia.org/T171282) (owner: 10Reedy) [11:45:34] (03PS4) 10Filippo Giunchedi: librenms: enable graphite extension [puppet] - 10https://gerrit.wikimedia.org/r/366836 (https://phabricator.wikimedia.org/T171167) [11:47:37] 10Operations, 10ops-eqiad, 10User-fgiunchedi: Degraded RAID on ms-be1016 - https://phabricator.wikimedia.org/T171183#3460163 (10fgiunchedi) [11:49:27] 10Operations, 10ops-eqiad, 10User-fgiunchedi: Degraded RAID on ms-be1016 - https://phabricator.wikimedia.org/T171183#3456811 (10fgiunchedi) Also the controller was swapped not long ago in {T150206} [11:51:38] !log run compiler-update-facts [11:51:49] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:52:45] (03CR) 10Jcrespo: [C: 031] "I am ok with this, but I would wait for +1 from Marostegui in case there is some pooling state that may be wrong, so he can double check i" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/366837 (https://phabricator.wikimedia.org/T171282) (owner: 10Reedy) [11:55:03] (03CR) 10Filippo Giunchedi: "PCC https://puppet-compiler.wmflabs.org/compiler02/7132/" [puppet] - 10https://gerrit.wikimedia.org/r/366836 (https://phabricator.wikimedia.org/T171167) (owner: 10Filippo Giunchedi) [11:56:41] 10Operations, 10Mail: Exim panics when spamd reaches maxchildren - https://phabricator.wikimedia.org/T166291#3291183 (10grin) I know I am lazy so I still haven't decyphered the configs how you handle spamd, but a few notes in the dark: * you can use //defer_ok// to let messages through in case of spamd failure... [12:00:17] (03CR) 10Ema: [C: 031] "Nice. FTR the librenms documentation also suggests a graphite config change: http://docs.librenms.org/Extensions/Graphite/" [puppet] - 10https://gerrit.wikimedia.org/r/366836 (https://phabricator.wikimedia.org/T171167) (owner: 10Filippo Giunchedi) [12:11:36] 10Operations, 10Traffic, 10Patch-For-Review: Explicitly limit varnishd transient storage - https://phabricator.wikimedia.org/T164768#3460185 (10ema) As of yesterday, [[https://github.com/wikimedia/operations-debs-varnish4/commit/837ce94afda2b55a394c69f554a97fe47f9e9dfa|varnish 4.1.7-1wm1]] is deployed on all... [12:36:12] (03CR) 10Muehlenhoff: [C: 031] "Confirmed, these are all limited to tracing with perf and seem safe to blacklist." [puppet] - 10https://gerrit.wikimedia.org/r/366548 (https://phabricator.wikimedia.org/T162612) (owner: 10Ema) [12:39:04] (03PS2) 10Ema: base::kernel: blacklist intel_cstate and intel_rapl_perf [puppet] - 10https://gerrit.wikimedia.org/r/366548 (https://phabricator.wikimedia.org/T162612) [12:39:11] (03CR) 10Ema: [V: 032 C: 032] base::kernel: blacklist intel_cstate and intel_rapl_perf [puppet] - 10https://gerrit.wikimedia.org/r/366548 (https://phabricator.wikimedia.org/T162612) (owner: 10Ema) [12:41:09] godog: FYI the compiler update facts was not yet updated AFAIK, so you might need to run it twice exporting the PUPPET_COMPILER with the hostname of the second jenkins slave [12:41:30] * volans|off off again ;) [12:50:49] !log rebooting cp* spares for kernel update [12:50:58] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:00:40] PROBLEM - Host cp1046 is DOWN: PING CRITICAL - Packet loss = 100% [13:01:18] ^fixing downtime [13:01:50] RECOVERY - Host cp1046 is UP: PING OK - Packet loss = 0%, RTA = 0.73 ms [13:05:06] (03CR) 10ArielGlenn: [C: 031] "what a great idea." [puppet] - 10https://gerrit.wikimedia.org/r/366525 (https://phabricator.wikimedia.org/T129222) (owner: 10Filippo Giunchedi) [13:23:15] volans|off: ah! thanks for the heads up [13:29:06] !log installing apache security updates on fermium/lists.wikimedia.org [13:29:15] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:35:50] PROBLEM - nutcracker process on thumbor1003 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [13:36:10] PROBLEM - dhclient process on thumbor1003 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [13:36:10] PROBLEM - salt-minion processes on thumbor1003 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [13:36:18] (03CR) 10Filippo Giunchedi: "See comment re: service_unit, the rest LGTM!" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/364396 (https://phabricator.wikimedia.org/T170666) (owner: 10Jcrespo) [13:36:50] RECOVERY - nutcracker process on thumbor1003 is OK: PROCS OK: 1 process with UID = 111 (nutcracker), command name nutcracker [13:37:00] RECOVERY - dhclient process on thumbor1003 is OK: PROCS OK: 0 processes with command name dhclient [13:37:02] RECOVERY - salt-minion processes on thumbor1003 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion [13:39:53] !log installation apache security updates on hafnium, bromine, krypton, rutherfordium [13:39:54] (03PS5) 10Ema: varnish cachestats.py: cache statsd server IP [puppet] - 10https://gerrit.wikimedia.org/r/366564 (https://phabricator.wikimedia.org/T151643) [13:40:01] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:41:15] 10Operations, 10MediaWiki-extensions-Scribunto: Build and push a new hhvm-luasandbox package - https://phabricator.wikimedia.org/T171166#3456205 (10Anomie) See also {T171267} where Tim started to look at this independently. [13:54:20] PROBLEM - Disk space on ocg1003 is CRITICAL: DISK CRITICAL - free space: / 1683 MB (3% inode=88%) [14:00:01] <_joe_> oh man [14:00:07] <_joe_> ocg1003 again [14:00:19] <_joe_> well someone else fix it [14:00:38] <_joe_> I'm sick and tired to nanny these obsolete systems [14:04:41] (03CR) 10Filippo Giunchedi: [C: 031] varnish cachestats.py: cache statsd server IP [puppet] - 10https://gerrit.wikimedia.org/r/366564 (https://phabricator.wikimedia.org/T151643) (owner: 10Ema) [14:12:31] PROBLEM - novaadmin has roles in every project on labnet1001 is CRITICAL: In deployment-prep, user novaadmin should have roles [user, projectadmin] but has [uuser] [14:18:08] 10Operations, 10Cloud-Services: wikitech api list=novainstances not returning list of instances - https://phabricator.wikimedia.org/T171280#3460474 (10hashar) ``` lang=json $ curl 'https://wikitech.wikimedia.org/w/api.php?action=query&list=novainstances&niregion=eqiad&format=json&niproject=deployment-prep' | j... [14:20:47] (03CR) 10Elukey: [C: 031] varnish cachestats.py: cache statsd server IP [puppet] - 10https://gerrit.wikimedia.org/r/366564 (https://phabricator.wikimedia.org/T151643) (owner: 10Ema) [14:24:55] (03PS5) 10Filippo Giunchedi: librenms: enable graphite extension [puppet] - 10https://gerrit.wikimedia.org/r/366836 (https://phabricator.wikimedia.org/T171167) [14:25:49] _joe_ still looking into ocg1003? I can brutally flush the postmortem dir :D [14:25:54] <_joe_> elukey: yes [14:26:09] <_joe_> elukey: we did fuck up something and a reboot exposed it [14:26:35] 10Operations, 10Cloud-Services: wikitech api list=novainstances not returning list of instances - https://phabricator.wikimedia.org/T171280#3459555 (10Andrew) There was a brief period when novaadmin couldn't log in, is it possible you just caught it at a bad moment? The above curl seems ok to me now. [14:27:04] <_joe_> elukey: it's very fitting it alarms on a friday afternoon [14:27:26] _joe_ weird /srv is in the root partition [14:27:37] ah snap [14:27:44] <_joe_> elukey: yes, that's the issue [14:27:56] yes yes I didn't see it before, you are right [14:27:58] <_joe_> elukey: I'm fixing it [14:28:07] super thanks [14:28:28] (03CR) 10Ayounsi: [C: 031] librenms: enable graphite extension [puppet] - 10https://gerrit.wikimedia.org/r/366836 (https://phabricator.wikimedia.org/T171167) (owner: 10Filippo Giunchedi) [14:29:59] (03CR) 10Ema: [C: 032] varnish cachestats.py: cache statsd server IP [puppet] - 10https://gerrit.wikimedia.org/r/366564 (https://phabricator.wikimedia.org/T151643) (owner: 10Ema) [14:30:18] <_joe_> !log stopping ocg temporarily on ocg1003, T162780 [14:30:28] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:30:28] T162780: ocg1003 partitions are severely misconfigured - https://phabricator.wikimedia.org/T162780 [14:30:37] 10Operations, 10Cloud-Services: wikitech api list=novainstances not returning list of instances - https://phabricator.wikimedia.org/T171280#3460516 (10hashar) Yup because I have added `novaadmin` as a member of the `deployment-prep` tenant. But for `tools` it is still empty: ``` $ curl 'https://wikitech.wikim... [14:32:30] RECOVERY - Disk space on ocg1003 is OK: DISK OK [14:33:24] \o/ [14:33:25] <_joe_> !log ocg started again on ocg1003 [14:33:34] <_joe_> elukey: and now fstab has the partition [14:33:35] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:33:46] nice! [14:33:52] it definitely looks better now [14:34:02] /dev/mapper/ocg1003--vg-output 266G 32G 222G 13% /srv/deployment/ocg/output [14:34:12] 10Operations: ocg1003 partitions are severely misconfigured - https://phabricator.wikimedia.org/T162780#3460522 (10Joe) 05Open>03Resolved [14:38:22] 10Operations, 10Cloud-Services: wikitech api list=novainstances not returning list of instances - https://phabricator.wikimedia.org/T171280#3460543 (10fgiunchedi) >>! In T171280#3460500, @Andrew wrote: > There was a brief period when novaadmin couldn't log in, is it possible you just caught it at a bad moment... [14:45:53] (03PS5) 10Mobrovac: Add the Scap3 configuration [software/logstash-logback-encoder] - 10https://gerrit.wikimedia.org/r/366466 (https://phabricator.wikimedia.org/T116340) [14:46:33] (03PS5) 10Mobrovac: Add the Scap configuration [software/cassandra-metrics-collector] - 10https://gerrit.wikimedia.org/r/366404 (https://phabricator.wikimedia.org/T137371) [14:47:29] (03CR) 10Mobrovac: Add the Scap3 configuration (033 comments) [software/logstash-logback-encoder] - 10https://gerrit.wikimedia.org/r/366466 (https://phabricator.wikimedia.org/T116340) (owner: 10Mobrovac) [14:48:00] (03CR) 10Mobrovac: "@Filippo, {{done}} here as well" [software/cassandra-metrics-collector] - 10https://gerrit.wikimedia.org/r/366404 (https://phabricator.wikimedia.org/T137371) (owner: 10Mobrovac) [14:49:12] 10Operations, 10Cloud-Services: wikitech api list=novainstances not returning list of instances - https://phabricator.wikimedia.org/T171280#3460618 (10Andrew) I just can't think of any reason why those roles would've been removed :( investigating [14:51:57] 10Operations, 10ops-eqiad: Broken disk on mw1228 - https://phabricator.wikimedia.org/T168613#3460626 (10Joe) I agree with @MoritzMuehlenhoff - in general assume server assignment is correct unless otherwise stated by us. [14:52:10] 10Operations, 10ops-eqiad: Broken disk on mw1228 - https://phabricator.wikimedia.org/T168613#3460627 (10Joe) a:05Joe>03RobH [14:54:50] !log Restarting Jenkins [14:54:54] 10Operations, 10ops-codfw: mw2140.codfw.wmnet unreponsive, cannot be powercycled with serial console - https://phabricator.wikimedia.org/T166328#3460636 (10Joe) 05Open>03Resolved p:05Triage>03Normal [14:54:58] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:55:42] 10Operations, 10Prod-Kubernetes (Experiment), 10User-Joe: Build calico - https://phabricator.wikimedia.org/T150434#3460644 (10Joe) 05Open>03Resolved [14:56:15] (03CR) 10Filippo Giunchedi: Add the Scap3 configuration (031 comment) [software/logstash-logback-encoder] - 10https://gerrit.wikimedia.org/r/366466 (https://phabricator.wikimedia.org/T116340) (owner: 10Mobrovac) [14:56:40] 10Operations, 10ops-eqiad, 10OCG-General, 10Reading-Web-Backlog (Tracking), 10User-Joe: ocg1001 is broken - https://phabricator.wikimedia.org/T170886#3460647 (10Joe) I've re-put this server in the rotation for the load-balancer. [14:58:01] (03PS6) 10Mobrovac: Add the Scap configuration [software/cassandra-metrics-collector] - 10https://gerrit.wikimedia.org/r/366404 (https://phabricator.wikimedia.org/T137371) [14:58:26] (03PS6) 10Mobrovac: Add the Scap3 configuration [software/logstash-logback-encoder] - 10https://gerrit.wikimedia.org/r/366466 (https://phabricator.wikimedia.org/T116340) [14:59:24] !log installation apache security updates on krypton and auth* [14:59:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:59:38] (03CR) 10Mobrovac: Add the Scap3 configuration (031 comment) [software/logstash-logback-encoder] - 10https://gerrit.wikimedia.org/r/366466 (https://phabricator.wikimedia.org/T116340) (owner: 10Mobrovac) [15:00:31] (03CR) 10Filippo Giunchedi: [C: 031] Add the Scap3 configuration [software/logstash-logback-encoder] - 10https://gerrit.wikimedia.org/r/366466 (https://phabricator.wikimedia.org/T116340) (owner: 10Mobrovac) [15:00:51] (03CR) 10Filippo Giunchedi: [C: 031] Add the Scap configuration [software/cassandra-metrics-collector] - 10https://gerrit.wikimedia.org/r/366404 (https://phabricator.wikimedia.org/T137371) (owner: 10Mobrovac) [15:02:38] !log installation apache security updates on labmon1001 and netmon* [15:02:47] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:09:50] PROBLEM - puppet last run on phab2001 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Package[apache2] [15:11:05] (03PS1) 10Muehlenhoff: role::horizon: Restrict to $CACHE_MISC [puppet] - 10https://gerrit.wikimedia.org/r/366858 [15:12:48] (03CR) 10Filippo Giunchedi: "LGTM, will it conflict with https://gerrit.wikimedia.org/r/#/c/366827/ ?" [puppet] - 10https://gerrit.wikimedia.org/r/366830 (owner: 10Elukey) [15:15:33] (03CR) 10Elukey: "Yep! It incorporates that code review and the other one for otrs, we'll do everything in one go (all no-ops)" [puppet] - 10https://gerrit.wikimedia.org/r/366830 (owner: 10Elukey) [15:16:21] !log restarting replication on db2072 after maintenance T151029 [15:16:30] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:16:30] T151029: duplicate key problems - https://phabricator.wikimedia.org/T151029 [15:17:33] I am going to disable notifications on db2072 for a while [15:17:40] in case the maintenance has broken something [15:22:17] (03PS1) 10Muehlenhoff: profile::docker::registry: Restrict to $CACHE_MISC [puppet] - 10https://gerrit.wikimedia.org/r/366859 [15:23:02] (03Abandoned) 10Muehlenhoff: profile::piwik::webserver: Include role::prometheus::apache_exporter [puppet] - 10https://gerrit.wikimedia.org/r/366827 (owner: 10Muehlenhoff) [15:23:22] (03Abandoned) 10Muehlenhoff: profile::otrs: Include role::prometheus::apache_exporter [puppet] - 10https://gerrit.wikimedia.org/r/366829 (owner: 10Muehlenhoff) [15:24:43] (03CR) 10Giuseppe Lavagetto: [C: 04-1] "We want to fetch the docker images directly internally, I guess." [puppet] - 10https://gerrit.wikimedia.org/r/366859 (owner: 10Muehlenhoff) [15:28:06] 10Operations, 10Gerrit: move gerrit.wm.org SSH service to private/behind LVS like phab-vcs - https://phabricator.wikimedia.org/T165631#3460756 (10demon) I was under the impression we couldn't do port-based LVS to the same domain. But I'm gladly willing to be wrong ☺️ And yes, same host key for the ssh daemon. [15:29:55] 10Operations, 10Traffic, 10Patch-For-Review: python-varnishapi daemons seeing "Log overrun" constantly - https://phabricator.wikimedia.org/T151643#3460773 (10ema) 05Open>03Resolved a:03ema The last overrun was logged about half an hour ago. ``` Jul 21 14:58:04 cp4009 varnishstatsd[46915]: Log overrun... [15:31:27] (03CR) 10Filippo Giunchedi: [C: 031] role::prometheus::apache_exporter: move to profiles [puppet] - 10https://gerrit.wikimedia.org/r/366830 (owner: 10Elukey) [15:33:12] 10Operations, 10ORES, 10Scoring-platform-team-Backlog, 10Graphite, 10User-fgiunchedi: Regularly purge old ores graphite metrics - https://phabricator.wikimedia.org/T169969#3460777 (10Halfak) great! We'll look into it. [15:35:37] 10Operations, 10Traffic: logster should not resolve statsd's IP every time it sends a metric - https://phabricator.wikimedia.org/T171318#3460781 (10ema) [15:36:30] 10Operations, 10Traffic: logster should not resolve statsd's IP every time it sends a metric - https://phabricator.wikimedia.org/T171318#3460793 (10ema) p:05Triage>03Normal [15:38:10] RECOVERY - puppet last run on phab2001 is OK: OK: Puppet is currently enabled, last run 47 seconds ago with 0 failures [15:39:06] (03PS6) 10Andrew Bogott: Puppetmaster: Fix apache config ssldir [puppet] - 10https://gerrit.wikimedia.org/r/365053 [15:57:09] (03PS4) 10Elukey: role::prometheus::apache_exporter: move to profiles [puppet] - 10https://gerrit.wikimedia.org/r/366830 [16:17:29] 10Operations, 10DBA: Evaluate how hard would be to get aa(wikibooks|wiktionary) and howiki databases deleted - https://phabricator.wikimedia.org/T169928#3460902 (10MarcoAurelio) Maybe it's just me, but when it comes that a project is dead, absolutely dead, with no content and no chances of revival I feel delet... [16:23:12] (03PS1) 10Jcrespo: Add s1 instance to dbstore2002 (imported from db2072) [puppet] - 10https://gerrit.wikimedia.org/r/366865 (https://phabricator.wikimedia.org/T171321) [16:25:03] 10Operations, 10ops-eqiad, 10Analytics, 10User-Elukey: SATA errors for stat1004 in the dmesg - https://phabricator.wikimedia.org/T162770#3460952 (10elukey) [16:26:50] 10Operations, 10ops-eqiad, 10DC-Ops, 10Patch-For-Review, and 2 others: rack/setup/install restbase-dev100[456] - https://phabricator.wikimedia.org/T166181#3460969 (10Eevans) 05Open>03Resolved We're good on the #services side of things; Thanks for the help! [16:28:15] 10Operations, 10Traffic, 10User-Elukey: logster should not resolve statsd's IP every time it sends a metric - https://phabricator.wikimedia.org/T171318#3460975 (10elukey) [16:31:52] Hello, anybody to run a script here? [16:32:54] 10Operations, 10Traffic, 10User-Elukey: logster should not resolve statsd's IP every time it sends a metric - https://phabricator.wikimedia.org/T171318#3461012 (10elukey) Logster is run in the root crontab every minute, we pass the statsd hostname:port via: ``` # Sets up Logster to read from the Varnish... [16:34:19] It seems that T165813 requests namespaceDupes.php to be run. Reedy, Dereckson, thcipriani, anybody around in Friday evening? [16:34:19] T165813: Create Author: namespace on Tamil wikisource - https://phabricator.wikimedia.org/T165813 [16:34:33] yeah [16:35:43] Reedy, are you going to run it? ;) [16:35:48] It's running [16:35:56] 3327 links to fix, 3327 were resolvable. [16:35:58] That's crazy [16:36:03] Reedy, why? [16:36:14] That's a lot of links to a fake namespace [16:36:24] If they are resolvable... [16:36:33] !log run namespaceDupes.php against tawikisource T165813 [16:36:35] Yeah, it's clean now [16:36:45] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:37:22] Thank you! [16:42:19] (03PS1) 10Daniel Kinzler: Add P279 to $wgPropertySuggesterClassifyingPropertyIds [mediawiki-config] - 10https://gerrit.wikimedia.org/r/366866 (https://phabricator.wikimedia.org/T169060) [16:50:27] (03PS1) 10Jforrester: Enable OOjs UI EditPage on all wikis except Commons [mediawiki-config] - 10https://gerrit.wikimedia.org/r/366867 [16:50:29] (03PS1) 10Jforrester: Enable OOjs UI EditPage on all wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/366868 [16:50:31] (03PS1) 10Jforrester: Remove setting no longer in MediaWiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/366869 [16:51:07] (03CR) 10Jforrester: [C: 04-2] "Not until ~October." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/366869 (owner: 10Jforrester) [16:51:18] (03CR) 10Jforrester: [C: 04-2] "Not until after Wikimania." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/366868 (owner: 10Jforrester) [16:51:30] (03CR) 10Jforrester: [C: 04-1] "Planned for ~1 August." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/366867 (owner: 10Jforrester) [16:52:56] (03PS2) 10Daniel Kinzler: Add P279 to $wgPropertySuggesterClassifyingPropertyIds [mediawiki-config] - 10https://gerrit.wikimedia.org/r/366866 (https://phabricator.wikimedia.org/T169060) [16:54:30] (03PS1) 10Ayounsi: Remove DNS records for unused IPs [dns] - 10https://gerrit.wikimedia.org/r/366871 [17:00:41] jouncebot: next [17:00:49] In 67 hour(s) and 59 minute(s): European Mid-day SWAT(Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20170724T1300) [17:01:16] rofl [17:01:40] PROBLEM - Router interfaces on mr1-eqiad is CRITICAL: CRITICAL: host 208.80.154.199, interfaces up: 36, down: 1, dormant: 0, excluded: 0, unused: 0 [17:02:27] !log now that db2072 is compressed and fixed, stop it to finally clone it to dbstore2002 T171321 [17:02:37] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:02:38] T171321: Finish dbstore2002 migration to multi-instance - https://phabricator.wikimedia.org/T171321 [17:03:50] PROBLEM - Host mr1-eqiad.oob IPv6 is DOWN: CRITICAL - Destination Unreachable (2607:f6f0:205::153) [17:04:40] RECOVERY - Router interfaces on mr1-eqiad is OK: OK: host 208.80.154.199, interfaces up: 38, down: 0, dormant: 0, excluded: 0, unused: 0 [17:07:30] (03PS1) 10Dzahn: librenms: rsync direction netmon1002->netmon2001 [puppet] - 10https://gerrit.wikimedia.org/r/366873 (https://phabricator.wikimedia.org/T171018) [17:09:00] RECOVERY - Host mr1-eqiad.oob IPv6 is UP: PING OK - Packet loss = 0%, RTA = 2.90 ms [17:09:40] (03CR) 10Dzahn: [C: 032] librenms: rsync direction netmon1002->netmon2001 [puppet] - 10https://gerrit.wikimedia.org/r/366873 (https://phabricator.wikimedia.org/T171018) (owner: 10Dzahn) [17:12:18] (03PS1) 10MarcoAurelio: High density logos for the Spanish Wikiquote [mediawiki-config] - 10https://gerrit.wikimedia.org/r/366875 (https://phabricator.wikimedia.org/T170604) [17:15:49] (03PS2) 10MarcoAurelio: High density logos for es.wikiquote [mediawiki-config] - 10https://gerrit.wikimedia.org/r/366875 (https://phabricator.wikimedia.org/T170604) [17:17:11] 10Operations, 10hardware-requests, 10monitoring, 10Patch-For-Review: decom netmon1001 - https://phabricator.wikimedia.org/T171018#3461226 (10Dzahn) a:05Dzahn>03RobH This is ready now. [17:23:38] 10Operations, 10Phabricator, 10Release-Engineering-Team (Kanban): replace sdb and then setup/install phab1001.eqiad.wmnet - https://phabricator.wikimedia.org/T163938#3461260 (10Dzahn) https://icinga.wikimedia.org/cgi-bin/icinga/status.cgi?host=all&type=detail&servicestatustypes=16&hoststatustypes=3&servicepr... [17:23:49] (03PS1) 10Jcrespo: Revert "Set debug_level on icinga" [puppet] - 10https://gerrit.wikimedia.org/r/366876 [17:24:17] (03CR) 10Jcrespo: "We believe this is fixed, no need for debug logs anymore." [puppet] - 10https://gerrit.wikimedia.org/r/366876 (owner: 10Jcrespo) [17:24:34] (03PS2) 10MarcoAurelio: High density logos for es.wikiversity [mediawiki-config] - 10https://gerrit.wikimedia.org/r/365066 (https://phabricator.wikimedia.org/T170604) [17:25:10] oh f... [17:25:16] I messed that patch [17:25:24] (03CR) 10Dzahn: [C: 031] Revert "Set debug_level on icinga" [puppet] - 10https://gerrit.wikimedia.org/r/366876 (owner: 10Jcrespo) [17:25:55] (03CR) 10jerkins-bot: [V: 04-1] High density logos for es.wikiversity [mediawiki-config] - 10https://gerrit.wikimedia.org/r/365066 (https://phabricator.wikimedia.org/T170604) (owner: 10MarcoAurelio) [17:27:19] (03CR) 10Dzahn: [C: 04-1] Restrict HTTP access in role::librenms [puppet] - 10https://gerrit.wikimedia.org/r/366519 (owner: 10Muehlenhoff) [17:27:31] (03PS3) 10MarcoAurelio: High density logos for es.wikiversity [mediawiki-config] - 10https://gerrit.wikimedia.org/r/365066 (https://phabricator.wikimedia.org/T170604) [17:28:29] (03PS1) 10RobH: decom netmon1001 [puppet] - 10https://gerrit.wikimedia.org/r/366877 (https://phabricator.wikimedia.org/T171018) [17:30:02] (03Abandoned) 10MarcoAurelio: High density logos for es.wikiversity [mediawiki-config] - 10https://gerrit.wikimedia.org/r/365066 (https://phabricator.wikimedia.org/T170604) (owner: 10MarcoAurelio) [17:30:04] (03PS1) 10RobH: decom of netmon1001 production dns [dns] - 10https://gerrit.wikimedia.org/r/366878 (https://phabricator.wikimedia.org/T171018) [17:31:29] (03CR) 10RobH: [C: 032] decom netmon1001 [puppet] - 10https://gerrit.wikimedia.org/r/366877 (https://phabricator.wikimedia.org/T171018) (owner: 10RobH) [17:31:47] (03CR) 10RobH: [C: 032] decom of netmon1001 production dns [dns] - 10https://gerrit.wikimedia.org/r/366878 (https://phabricator.wikimedia.org/T171018) (owner: 10RobH) [17:35:37] 10Operations, 10ops-eqiad, 10hardware-requests, 10monitoring: decom netmon1001 - https://phabricator.wikimedia.org/T171018#3461338 (10RobH) p:05High>03Normal a:05RobH>03Cmjohnson [17:46:06] phab1001 may echo errors but im trying to clear some icinga issues with it [17:46:10] be aware its me. [17:48:17] (03CR) 10Daniel Kinzler: [C: 031] "@Alexandros I see no way forward from your CR-1. How can we resolve this?" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/360887 (https://phabricator.wikimedia.org/T163922) (owner: 10Ladsgroup) [17:49:47] robh: didn't look but might be because phab servers get more than one IP [17:49:55] also thakns for netmon [17:50:03] so it worked before the phab shit was applied via site.pp [17:50:06] so its likely that [17:50:14] yea, the role would add second IP [17:50:20] i suppose we have to hardcode ssh server to an ip in its config? [17:50:23] for ssh [17:50:27] im surprised the other phab systems didnt have this issue [17:50:33] or perhaps they did and it was a manual fix =[ [17:50:43] the secondary IP might have been added manually on the interface [17:50:45] systemctl status ssh.service doest give much detail [17:50:50] but there is puppet code too.. ehm.. [17:51:03] ifconfig only shows one ip [17:51:04] 10.64.16.8 [17:51:06] on eth0 [17:51:17] but two ipv6 [17:51:25] (03CR) 10Krinkle: Fix up some file indenting broken by my phpcs changes (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/366837 (https://phabricator.wikimedia.org/T171282) (owner: 10Reedy) [17:51:32] phab1001-vcs.eqiad.wmnet. [17:51:44] 10.64.32.186 [17:51:58] i think that would be the second one for it [17:52:12] inet6 addr: 2620:0:861:102:10:64:16:8/64 Scope:Global [17:52:12] inet6 addr: 2620:0:861:103:10:64:32:186/128 Scope:Global [17:52:23] but only a single inet/ipv4 [17:52:33] i had to login via serial and root [17:52:35] ah, so that matches [17:53:24] oh, thats normal for ifconfig [17:53:31] robh: ugh, i see... that IP is on iridium [17:53:36] ohhh [17:53:42] thats bad [17:53:54] so remember how in the beginning that was all about "reinstall iridum as phab1001" [17:53:56] so fix dns and we're ok? [17:54:00] and we have that other ticket called that [17:54:06] and then plans were changed [17:54:07] or do we need to reimage or reip? [17:54:14] so back then it made sense [17:54:16] having two systems trying to snag an ip can cause issues [17:54:18] to give this the right name already [17:54:34] so this phab1001 may be causing havok right now via stealing the same ip as another system [17:55:02] mutante: So I'm not following 100%, since I'm not sure how this second IP was added and why its duplicate? [17:55:14] its not something we assign in dns, or we've assign in dns and now we've applied to two systems? [17:55:17] (seems the latter?) [17:55:19] because iridium was supposed to be reinstalled and renamed to phab1001 [17:55:42] and then we added a new phab1001 instead because iridium is out of warranty right [17:55:49] my main concern is having this online with the same ip as the primary phab server could be breaking shit [17:56:02] yes, so let's remove the phab role from this [17:56:11] will that remove the ip? [17:56:13] and remove that second IPv6 address from the interface [17:56:26] no, but so we can remove it without it getting readde [17:57:20] so ifconfig eth0 inet6 del 2620:0:860:103:10:192:32:149/128 [17:57:27] is what we want removed right? [17:57:30] im in as root so i can now. [17:57:37] no, the other one [17:57:39] ive halted puppet on it until you fix the role so it wont add back =] [17:57:43] 2620:0:861:103:10:64:32:186 [17:57:44] oh, indeed, sorry, [17:57:50] the one you pasted is 2001 [17:57:56] ifconfig eth0 inet6 del 2620:0:860:103:10:192:32:147/64 [17:58:00] oh [17:58:04] godamn it im in wrong window [17:58:07] closing that ;D [17:58:25] ifconfig eth0 inet6 del 2620:0:861:103:10:64:32:186/128 [17:58:30] so, i'm on iridium [17:58:40] and have 2620:0:861:103:10:64:32:186 [17:58:44] so that needs to go, right [17:58:46] yes [17:58:50] ok, removing off phab1001 now [17:59:17] ok, its gone and puppet is still disabled on it [17:59:25] so it wont pop back on until we're ready [17:59:29] ok, cool [17:59:33] restarting ssh manually, should be ok [17:59:50] it's not the regular ssh server [17:59:54] but the one that phab runs [18:00:03] that has this IP [18:00:14] yeah but phab1001 was refusing ssh [18:00:27] oh, right [18:00:41] and still is even though i remove dand then restarted networking... [18:00:49] maybe just reboot it [18:00:53] yeah [18:00:55] will do [18:01:05] rebooting it now [18:01:38] mutante: so yeah, that makes sense the iridium to phab rename, etc... [18:01:46] PROBLEM - Host phab1001 is DOWN: PING CRITICAL - Packet loss = 100% [18:01:48] so its going to take some cleanup on th backend before redeploying phab [18:01:53] role to phab1001 [18:01:54] it was like "don't call this iridium-vcs now" [18:01:58] and that point in time [18:02:03] yes [18:02:08] yep, makes perfect sense [18:02:14] we could put it back to "spare" role for a moment [18:02:27] likely the best thing to do until otherwise, just so it gets updates [18:02:35] yea [18:02:50] i just saw the phab comment about ssh and was all 'phab1001 again, why does this server hate me?' [18:02:51] heh [18:03:00] shit, still failing. [18:03:16] RECOVERY - Host phab1001 is UP: PING OK - Packet loss = 0%, RTA = 0.25 ms [18:03:45] ● ssh.service - OpenBSD Secure Shell server [18:03:45] Loaded: loaded (/lib/systemd/system/ssh.service; enabled) [18:03:45] Active: failed (Result: start-limit) since Fri 2017-07-21 18:02:56 UTC; 34s ago [18:03:47] Process: 937 ExecStart=/usr/sbin/sshd -D $SSHD_OPTS (code=exited, status=255) [18:03:49] Main PID: 937 (code=exited, status=255) [18:04:10] Jul 21 18:02:56 phab1001 systemd[1]: Unit ssh.service entered failed state. [18:04:10] Jul 21 18:02:56 phab1001 systemd[1]: ssh.service holdoff time over, scheduling restart. [18:04:11] Jul 21 18:02:56 phab1001 systemd[1]: Stopping OpenBSD Secure Shell server... [18:04:12] Jul 21 18:02:56 phab1001 systemd[1]: Starting OpenBSD Secure Shell server... [18:04:14] Jul 21 18:02:56 phab1001 systemd[1]: ssh.service start request repeated too quickly, refusing to start. [18:04:16] Jul 21 18:02:56 phab1001 systemd[1]: Failed to start OpenBSD Secure Shell server. [18:04:18] Jul 21 18:02:56 phab1001 systemd[1]: Unit ssh.service entered failed state. [18:04:20] without the ... [18:04:39] same thing if i fire manually [18:04:42] same error output [18:04:59] mutante: we may wanna just reimage it rather than troubleshoot ;] [18:05:13] since its going to require tweaking to the role [18:05:26] its not doing anythign yet so no harm, no foul ;] [18:05:38] it wouldnt make a difference and might be the cleanest/easiest yea [18:05:48] im just gonna do that now [18:05:51] heh [18:05:52] ok [18:05:55] can you do ps for role spare? [18:06:04] ok [18:06:05] (03PS1) 10Phuedx: pagePreviews: Increase instrumentation sampling rate [mediawiki-config] - 10https://gerrit.wikimedia.org/r/366882 (https://phabricator.wikimedia.org/T171325) [18:09:08] (03PS1) 10Dzahn: phabricator: make phab1001 use role::spare for now [puppet] - 10https://gerrit.wikimedia.org/r/366885 [18:09:46] pxe booting it now, so itll be back soonish [18:09:49] (03PS2) 10Dzahn: phabricator: make phab1001 use role::spare for now [puppet] - 10https://gerrit.wikimedia.org/r/366885 (https://phabricator.wikimedia.org/T163938) [18:10:10] of course, yesterday i reimagd it, and didnt realize the bios was set to the wrong boot order [18:10:22] so it just reimaged itself when i finished, this server likes being difficult ;D [18:10:38] heh [18:10:50] i'm merging that and then changing location really quick [18:11:38] (03PS4) 10Dzahn: phabricator: make phab1001 use role::spare for now [puppet] - 10https://gerrit.wikimedia.org/r/366885 (https://phabricator.wikimedia.org/T163938) [18:15:02] (03CR) 10Dzahn: [C: 032] phabricator: make phab1001 use role::spare for now [puppet] - 10https://gerrit.wikimedia.org/r/366885 (https://phabricator.wikimedia.org/T163938) (owner: 10Dzahn) [18:17:26] (03PS1) 10Ladsgroup: mediawiki: increase the maximum time of dispatchChanges cronjob [puppet] - 10https://gerrit.wikimedia.org/r/366887 (https://phabricator.wikimedia.org/T171263) [18:24:15] 10Operations, 10Phabricator, 10Patch-For-Review, 10Release-Engineering-Team (Kanban): replace sdb and then setup/install phab1001.eqiad.wmnet - https://phabricator.wikimedia.org/T163938#3461462 (10RobH) So this had an issue when the new role assigned an IP to it that was in use in iridium. So we've put it... [19:06:19] (03Abandoned) 10Urbanecm: Activate DynamicPageList on dewikivoyage [mediawiki-config] - 10https://gerrit.wikimedia.org/r/366838 (https://phabricator.wikimedia.org/T171293) (owner: 10Urbanecm) [19:16:00] (03PS1) 10Dzahn: cache::misc: rename director for phabricator from iridium [puppet] - 10https://gerrit.wikimedia.org/r/366893 [19:17:45] (03CR) 10Jforrester: "Code using this was removed in wmf/1.30.0-wmf.6, so this should be good to go whenever. SWAT it on Monday?" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/358415 (https://phabricator.wikimedia.org/T165018) (owner: 10Pmiazga) [19:20:19] (03CR) 10Paladox: [C: 031] cache::misc: rename director for phabricator from iridium [puppet] - 10https://gerrit.wikimedia.org/r/366893 (owner: 10Dzahn) [19:21:18] (03PS2) 10Dzahn: cache::misc: rename director for phabricator from iridium [puppet] - 10https://gerrit.wikimedia.org/r/366893 [19:27:22] (03CR) 10Dzahn: [C: 032] cache::misc: rename director for phabricator from iridium [puppet] - 10https://gerrit.wikimedia.org/r/366893 (owner: 10Dzahn) [19:34:23] (03PS1) 10Dzahn: cache::misc: remove now unused phab director iridium [puppet] - 10https://gerrit.wikimedia.org/r/366896 [19:39:41] (03PS2) 10Dzahn: cache::misc: remove now unused phab director iridium [puppet] - 10https://gerrit.wikimedia.org/r/366896 [19:39:50] (03CR) 10Paladox: [C: 031] cache::misc: remove now unused phab director iridium [puppet] - 10https://gerrit.wikimedia.org/r/366896 (owner: 10Dzahn) [19:43:04] 10Operations, 10vm-requests: VM request for RelEng pwstore - https://phabricator.wikimedia.org/T171342#3461689 (10thcipriani) [19:43:27] 10Operations, 10Release-Engineering-Team, 10vm-requests: VM request for RelEng pwstore - https://phabricator.wikimedia.org/T171342#3461703 (10thcipriani) [19:48:36] 10Operations, 10Release-Engineering-Team, 10vm-requests: VM request for RelEng pwstore - https://phabricator.wikimedia.org/T171342#3461712 (10Dzahn) I'm wondering if it can just live next to the existing pwstore repo that ops uses. It's just a repo and everything is encrypted with GPG and releng probably alr... [19:49:54] 10Operations, 10Release-Engineering-Team, 10vm-requests: VM request for RelEng pwstore - https://phabricator.wikimedia.org/T171342#3461714 (10Dzahn) actually, it may even make sense to do it all in a single pwstore repo and just add a new group for releng. [19:51:26] (03CR) 10Dzahn: [C: 032] cache::misc: remove now unused phab director iridium [puppet] - 10https://gerrit.wikimedia.org/r/366896 (owner: 10Dzahn) [19:54:19] (03CR) 10Dzahn: [C: 04-1] "talked about this on IRC and it seems a simpler solution to just have "gerrit.config.erb" and "gerrit.test.config.erb" templates. allows f" [puppet] - 10https://gerrit.wikimedia.org/r/366768 (owner: 10Paladox) [19:55:12] hi [19:55:36] mutante: around? [19:56:22] aude: hello [19:56:37] i need to update my ssh key [19:56:38] PROBLEM - puppet last run on labtestservices2001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [19:56:50] https://gerrit.wikimedia.org/r/#/c/363180/ [19:57:10] several wikidata people verified / +1 [19:57:31] and i put it on the swat page (but couldn't be around yesterday) [19:57:32] https://wikitech.wikimedia.org/w/index.php?title=Deployments&type=revision&diff=1765664&oldid=1765661 [19:57:40] (where i have 2 factor auth) [19:58:13] 10Operations, 10Phabricator, 10Patch-For-Review, 10Release-Engineering-Team (Kanban): replace sdb and then setup/install phab1001.eqiad.wmnet - https://phabricator.wikimedia.org/T163938#3461727 (10RobH) [19:58:35] 10Operations, 10Phabricator, 10Patch-For-Review, 10Release-Engineering-Team (Kanban): replace sdb and then setup/install phab1001.eqiad.wmnet - https://phabricator.wikimedia.org/T163938#3215445 (10RobH) a:05RobH>03Dzahn back to @dzahn for service implementation [19:58:46] 10Operations, 10Phabricator, 10Release-Engineering-Team (Kanban): replace sdb and then setup/install phab1001.eqiad.wmnet - https://phabricator.wikimedia.org/T163938#3461743 (10RobH) [19:58:49] !log Restarting Jenkins [19:58:59] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:59:17] (03PS5) 10Dzahn: admins: Update aude's ssh key [puppet] - 10https://gerrit.wikimedia.org/r/363180 (owner: 10Aude) [19:59:22] (03PS6) 10Dzahn: admins: Update aude's ssh key [puppet] - 10https://gerrit.wikimedia.org/r/363180 (owner: 10Aude) [19:59:48] (03CR) 10Dzahn: [C: 032] "confirmed this reverts https://gerrit.wikimedia.org/r/#/c/207079/ which had confirmations" [puppet] - 10https://gerrit.wikimedia.org/r/363180 (owner: 10Aude) [20:00:01] thanks :) [20:00:04] aude: yes, makes sense and i confirmed it reverts that [20:00:06] np [20:00:56] there are several tasks i need to work on that probably require access again [20:01:07] which server do you need first? [20:01:22] probably terbium but doesn't matter [20:01:25] i can run puppet there.. or you can just get a coffee [20:01:34] i can wati [20:01:36] wait* [20:02:41] runs puppet on all bastion hosts.. terbium already done [20:02:46] ok [20:02:49] let me try... [20:04:21] it won't work yet, it's just kind of slow running [20:04:45] 2/4 done, depending which one you use [20:06:20] ok, it works now [20:07:01] cool, and i see an error about bast3002 , heh, but that's another thing [20:07:14] ok [20:07:49] i had to clear my known hosts since those were ~2 years old [20:08:43] 10Operations, 10Release-Engineering-Team, 10vm-requests: VM request for RelEng pwstore - https://phabricator.wikimedia.org/T171342#3461689 (10greg) Yeah, whatever works, we didn't want to over-presume on Ops' part :) [20:08:59] 10Operations, 10vm-requests, 10Release-Engineering-Team (Watching / External): VM request for RelEng pwstore - https://phabricator.wikimedia.org/T171342#3461782 (10greg) [20:10:16] aude: if you want to actually check the ssh fingerprints btw, https://wikitech.wikimedia.org/wiki/Help:SSH_Fingerprints [20:11:16] ok thanks [20:14:48] RECOVERY - puppet last run on labtestservices2001 is OK: OK: Puppet is currently enabled, last run 25 seconds ago with 0 failures [20:19:45] (03Draft1) 10Paladox: Gerrit: Remove ldap user and password from secure.config [puppet] - 10https://gerrit.wikimedia.org/r/366910 [20:19:48] (03PS2) 10Paladox: Gerrit: Remove ldap user and password from secure.config [puppet] - 10https://gerrit.wikimedia.org/r/366910 [20:19:52] (03PS1) 10Dzahn: gerrit: make name of config template flexible [puppet] - 10https://gerrit.wikimedia.org/r/366911 [20:20:33] (03CR) 10Paladox: gerrit: make name of config template flexible (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/366911 (owner: 10Dzahn) [20:21:11] (03CR) 10jerkins-bot: [V: 04-1] gerrit: make name of config template flexible [puppet] - 10https://gerrit.wikimedia.org/r/366911 (owner: 10Dzahn) [20:22:41] 10Operations, 10Wikidata, 10Wikimedia-Language-setup, 10Wikimedia-Site-requests: Add Dinka Wikipedia to Wikidata - https://phabricator.wikimedia.org/T170930#3461803 (10aude) for some reason, the sites table on dinwiki only had an entry for dinwiki and not any other wiki. I see dinwiki in the sites table f... [20:22:46] already fixed one issue :) [20:22:59] aude: :) [20:23:04] paladox: hah, 1G , thx [20:23:10] lol your welcome :) [20:23:11] (03PS2) 10Dzahn: gerrit: make name of config template flexible [puppet] - 10https://gerrit.wikimedia.org/r/366911 [20:23:15] vim command :) [20:23:20] :) [20:23:42] (03CR) 10Paladox: [C: 031] gerrit: make name of config template flexible [puppet] - 10https://gerrit.wikimedia.org/r/366911 (owner: 10Dzahn) [20:23:49] :) [20:24:27] (03CR) 10jerkins-bot: [V: 04-1] gerrit: make name of config template flexible [puppet] - 10https://gerrit.wikimedia.org/r/366911 (owner: 10Dzahn) [20:27:10] (03PS3) 10Dzahn: gerrit: make name of config template flexible [puppet] - 10https://gerrit.wikimedia.org/r/366911 [20:27:56] (03CR) 10Paladox: [C: 031] gerrit: make name of config template flexible (033 comments) [puppet] - 10https://gerrit.wikimedia.org/r/366911 (owner: 10Dzahn) [20:28:03] (03CR) 10jerkins-bot: [V: 04-1] gerrit: make name of config template flexible [puppet] - 10https://gerrit.wikimedia.org/r/366911 (owner: 10Dzahn) [20:28:17] lol,jerkins won't agree with you [20:28:56] mutante ignore my comment about the hiera file :), realised that the hiera config will overide it [20:29:27] i did that because i'm not supposed to have defaults with the hiera lookup in profile [20:29:30] ok [20:29:42] 2 syntax errors :p fixing [20:30:46] (03PS4) 10Dzahn: gerrit: make name of config template flexible [puppet] - 10https://gerrit.wikimedia.org/r/366911 [20:31:42] (03CR) 10jerkins-bot: [V: 04-1] gerrit: make name of config template flexible [puppet] - 10https://gerrit.wikimedia.org/r/366911 (owner: 10Dzahn) [20:32:07] .. [20:32:08] (03CR) 10Paladox: gerrit: make name of config template flexible (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/366911 (owner: 10Dzahn) [20:33:31] (03PS5) 10Dzahn: gerrit: make name of config template flexible [puppet] - 10https://gerrit.wikimedia.org/r/366911 [20:34:03] that's a 'full house'. when you manage to add a syntax error in each separate file... duh [20:36:03] (03CR) 10Paladox: [C: 031] gerrit: make name of config template flexible [puppet] - 10https://gerrit.wikimedia.org/r/366911 (owner: 10Dzahn) [20:36:05] passes [20:36:06] heh [20:38:21] compiles and fails :p [20:38:38] PROBLEM - novaadmin has roles in every project on labtestnet2001 is CRITICAL: In bastion, user novaadmin should have roles [user, projectadmin] but has [uuser] [20:41:19] (03PS6) 10Smalyshev: logstash: Parse nginx access logs for wdqs [puppet] - 10https://gerrit.wikimedia.org/r/299825 (owner: 10BryanDavis) [20:42:13] (03PS6) 10Dzahn: gerrit: make name of config template flexible [puppet] - 10https://gerrit.wikimedia.org/r/366911 [20:43:19] (03CR) 10Chad: "What? We don't connect with the users' own credentials." [puppet] - 10https://gerrit.wikimedia.org/r/366910 (owner: 10Paladox) [20:43:59] (03CR) 10Paladox: "> What? We don't connect with the users' own credentials." [puppet] - 10https://gerrit.wikimedia.org/r/366910 (owner: 10Paladox) [20:44:46] 10Operations, 10vm-requests, 10Release-Engineering-Team (Watching / External): VM request for RelEng pwstore - https://phabricator.wikimedia.org/T171342#3461689 (10demon) >>! In T171342#3461714, @Dzahn wrote: > actually, it may even make sense to do it all in a single pwstore repo and just add a new group fo... [20:46:16] (03CR) 10Chad: "A better option would be to configure labs to use DEVELOPMENT_BECOME_ANY_ACCOUNT instead of LDAP." [puppet] - 10https://gerrit.wikimedia.org/r/366768 (owner: 10Paladox) [20:46:59] 10Operations, 10vm-requests, 10Release-Engineering-Team (Watching / External): VM request for RelEng pwstore - https://phabricator.wikimedia.org/T171342#3461844 (10Dzahn) Should be feasible since we already added separate access group for dc-ops before (T158285). [20:47:01] (03CR) 10Paladox: "> A better option would be to configure labs to use DEVELOPMENT_BECOME_ANY_ACCOUNT" [puppet] - 10https://gerrit.wikimedia.org/r/366768 (owner: 10Paladox) [20:47:18] (03CR) 10Paladox: "i've setup ldap anyways." [puppet] - 10https://gerrit.wikimedia.org/r/366768 (owner: 10Paladox) [20:51:06] (03CR) 10Paladox: "Also perfect timing to split the config file to allow us to test configs without doing for prod." [puppet] - 10https://gerrit.wikimedia.org/r/366768 (owner: 10Paladox) [20:51:20] (03CR) 10BryanDavis: "I *think* an anon bind to the LDAP tree can see and do all of the same things that the proxyagent user can. The only difference may be the" [puppet] - 10https://gerrit.wikimedia.org/r/366910 (owner: 10Paladox) [20:51:30] (03PS7) 10Smalyshev: logstash: Parse nginx access logs for wdqs [puppet] - 10https://gerrit.wikimedia.org/r/299825 (owner: 10BryanDavis) [20:51:36] (03CR) 10Smalyshev: [C: 031] logstash: Parse nginx access logs for wdqs [puppet] - 10https://gerrit.wikimedia.org/r/299825 (owner: 10BryanDavis) [20:51:38] (03CR) 10Chad: "By whom? Where? There's no task. What's the context here? This has worked for years :)" [puppet] - 10https://gerrit.wikimedia.org/r/366910 (owner: 10Paladox) [20:51:57] (03CR) 10Paladox: "> By whom? Where? There's no task. What's the context here? This has" [puppet] - 10https://gerrit.wikimedia.org/r/366910 (owner: 10Paladox) [20:52:42] (03CR) 10BryanDavis: "I did not encourage submission of this patch. It was just something we came across in setting up a testing server in a Cloud VPS project." [puppet] - 10https://gerrit.wikimedia.org/r/366910 (owner: 10Paladox) [20:53:44] (03PS7) 10Dzahn: gerrit: make name of config template flexible [puppet] - 10https://gerrit.wikimedia.org/r/366911 [20:54:19] (03CR) 10Chad: [C: 04-1] "This doesn't make sense. An alternative config file won't be loaded by gerrit. If you want to test a config change in labs, just disable p" [puppet] - 10https://gerrit.wikimedia.org/r/366911 (owner: 10Dzahn) [20:54:41] (03CR) 10jerkins-bot: [V: 04-1] gerrit: make name of config template flexible [puppet] - 10https://gerrit.wikimedia.org/r/366911 (owner: 10Dzahn) [20:55:07] 10Operations, 10vm-requests, 10Release-Engineering-Team (Watching / External): VM request for RelEng pwstore - https://phabricator.wikimedia.org/T171342#3461689 (10MoritzMuehlenhoff) There's really no need for a separate VM, a pwstore is just a git repo with a few megabytes of data :-) I suggest you simply... [20:55:14] (03CR) 10Dzahn: "permanently disabling puppet doesn't seem to be a great solution for labs though" [puppet] - 10https://gerrit.wikimedia.org/r/366911 (owner: 10Dzahn) [20:56:28] (03CR) 10Dzahn: "why would it not be loaded by gerrit, we are just changing the name of the source template not the destination file" [puppet] - 10https://gerrit.wikimedia.org/r/366911 (owner: 10Dzahn) [21:02:01] (03PS2) 10Jcrespo: Add s1 instance to dbstore2002 (imported from db2072) [puppet] - 10https://gerrit.wikimedia.org/r/366865 (https://phabricator.wikimedia.org/T171321) [21:03:21] (03CR) 10Jcrespo: [C: 032] Add s1 instance to dbstore2002 (imported from db2072) [puppet] - 10https://gerrit.wikimedia.org/r/366865 (https://phabricator.wikimedia.org/T171321) (owner: 10Jcrespo) [21:08:57] (03CR) 10Dzahn: [C: 04-1] "my counter proposal is https://gerrit.wikimedia.org/r/#/c/366911/" [puppet] - 10https://gerrit.wikimedia.org/r/366768 (owner: 10Paladox) [21:12:57] (03CR) 10Chad: "So I didn't understand the context of this, and it makes a little more sense now that I've looked into it. Ok....this could work. Anon LDA" [puppet] - 10https://gerrit.wikimedia.org/r/366910 (owner: 10Paladox) [21:14:57] 10Operations, 10vm-requests, 10Release-Engineering-Team (Watching / External): VM request for RelEng pwstore - https://phabricator.wikimedia.org/T171342#3461959 (10demon) >>! In T171342#3461910, @MoritzMuehlenhoff wrote: > There's really no need for a separate VM, a pwstore is just a git repo with a few mega... [21:17:03] 10Operations, 10vm-requests, 10Release-Engineering-Team (Watching / External): VM request for RelEng pwstore - https://phabricator.wikimedia.org/T171342#3461960 (10Dzahn) The other deployers would still have to decrypt the encrypted files to actually see content though, so it's not really sharing secrets wit... [21:19:45] (03CR) 10Chad: "Oh ok, I misread what this did. Fine by me." [puppet] - 10https://gerrit.wikimedia.org/r/366911 (owner: 10Dzahn) [21:29:03] 10Operations, 10vm-requests, 10Release-Engineering-Team (Watching / External): VM request for RelEng pwstore - https://phabricator.wikimedia.org/T171342#3461968 (10demon) >>! In T171342#3461960, @Dzahn wrote: > The other deployers would still have to decrypt the encrypted files to actually see content though... [21:29:50] !log dropping enwiki database from dbstore2002:3306 (default instance) - new s1 already imported on 3311 [21:30:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:31:13] I hope I did that ^on the right host [21:33:24] (03PS30) 10Paladox: Gerrit: Add support for scap [puppet] - 10https://gerrit.wikimedia.org/r/363726 (https://phabricator.wikimedia.org/T157414) [21:33:45] heh @ jynus :) [21:34:39] https://grafana.wikimedia.org/dashboard/file/server-board.json?refresh=1m&panelId=17&fullscreen&orgId=1&var-server=dbstore2002 [21:35:55] wow, nice bump [21:37:56] as long as it is only on dbstore2002 and not on the other hosts [21:38:38] same graphe for dbstore1001 looks straight :) [21:39:08] (03PS1) 10Ottomata: Temporarily disable webrequest deletion while lawyers do some research [puppet] - 10https://gerrit.wikimedia.org/r/366966 [21:40:07] (03CR) 10jerkins-bot: [V: 04-1] Temporarily disable webrequest deletion while lawyers do some research [puppet] - 10https://gerrit.wikimedia.org/r/366966 (owner: 10Ottomata) [21:41:28] (03PS2) 10Ottomata: Temporarily disable webrequest deletion while lawyers do some research [puppet] - 10https://gerrit.wikimedia.org/r/366966 [21:43:50] (03CR) 10Ottomata: [C: 032] Temporarily disable webrequest deletion while lawyers do some research [puppet] - 10https://gerrit.wikimedia.org/r/366966 (owner: 10Ottomata) [21:53:04] 10Operations, 10Cloud-Services: wikitech api list=novainstances not returning list of instances - https://phabricator.wikimedia.org/T171280#3462030 (10Andrew) [21:58:39] 10Operations, 10vm-requests, 10Release-Engineering-Team (Watching / External): VM request for RelEng pwstore - https://phabricator.wikimedia.org/T171342#3462054 (10thcipriani) 05Open>03declined >>! In T171342#3461960, @Dzahn wrote: > The other deployers would still have to decrypt the encrypted files to... [22:02:14] (03PS10) 10Paladox: gerrit: DO NOT MERGE [software/gerrit] - 10https://gerrit.wikimedia.org/r/363738 [22:02:16] (03PS9) 10Paladox: Gerrit: Upgrading gerrit to 2.14.2 (DO NOT MERGE) [software/gerrit] - 10https://gerrit.wikimedia.org/r/363734 [22:06:13] 10Operations, 10vm-requests, 10Release-Engineering-Team (Watching / External): VM request for RelEng pwstore - https://phabricator.wikimedia.org/T171342#3462088 (10Dzahn) >>! In T171342#3462054, @thcipriani wrote: > An extra layer of security in the case of storing passwords is prudent and not unreasonable.... [22:07:52] 10Operations, 10vm-requests, 10Release-Engineering-Team (Watching / External): VM request for RelEng pwstore - https://phabricator.wikimedia.org/T171342#3462091 (10Dzahn) sorry, you said they are still encrypted. so still using pwstore but with phab repo. gotcha then.! [22:12:07] 10Operations, 10hardware-requests: hardware request for netmon1001 replacement - https://phabricator.wikimedia.org/T156040#3462100 (10Dzahn) [22:12:10] 10Operations, 10monitoring, 10Patch-For-Review: setup netmon1002.wikimedia.org - https://phabricator.wikimedia.org/T159756#3462098 (10Dzahn) 05Open>03Resolved There was an issue with rancid logging in on switches/routers. ssh-agent refused operation. thanks to thcipriani pointing out sometimes you have... [22:12:59] 10Operations, 10monitoring, 10Patch-For-Review: rack/setup/install netmon2001 - https://phabricator.wikimedia.org/T166180#3462102 (10Dzahn) 05Open>03Resolved [22:13:14] 10Operations, 10monitoring, 10Patch-For-Review: rack/setup/install netmon2001 - https://phabricator.wikimedia.org/T166180#3287592 (10Dzahn) [22:13:55] RECOVERY - novaadmin has roles in every project on labnet1001 is OK: novaadmin has the correct roles in all projects. [22:16:13] 10Operations, 10Cloud-Services: wikitech api list=novainstances not returning list of instances - https://phabricator.wikimedia.org/T171280#3462124 (10Andrew) 05Open>03Resolved a:03Andrew I have a fix to prevent this from happening again... in the meantime I've added novaadmin back to everything. [22:19:13] (03PS8) 10Dzahn: gerrit: make name of config template flexible [puppet] - 10https://gerrit.wikimedia.org/r/366911 [22:21:53] (03CR) 10Dzahn: [C: 032] "jenkins and compiler like it now http://puppet-compiler.wmflabs.org/7136/" [puppet] - 10https://gerrit.wikimedia.org/r/366911 (owner: 10Dzahn) [22:22:11] (03CR) 10BryanDavis: "> LDAP_BIND ... but does require putting the user's LDAP password into the session (Bryan, can you clarify here?)." [puppet] - 10https://gerrit.wikimedia.org/r/366910 (owner: 10Paladox) [22:22:35] (03PS9) 10Dzahn: gerrit: make name of config template flexible [puppet] - 10https://gerrit.wikimedia.org/r/366911 [22:23:00] ACKNOWLEDGEMENT - novaadmin has roles in every project on labtestnet2001 is CRITICAL: In structured-wikiquote, user novaadmin should have roles [user, projectadmin] but has [uuser] andrew bogott Im about to fix this [22:31:52] (03CR) 10Krinkle: Function comments, parameters and stuffs (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/366771 (owner: 10Reedy) [22:33:10] (03CR) 10Dzahn: "no-op in prod" [puppet] - 10https://gerrit.wikimedia.org/r/366911 (owner: 10Dzahn) [22:34:36] RECOVERY - novaadmin has roles in every project on labtestnet2001 is OK: novaadmin has the correct roles in all projects. [22:37:40] (03PS1) 10Madhuvishy: install_server: Add new partman recipe for labstore100[1-2] [puppet] - 10https://gerrit.wikimedia.org/r/366977 (https://phabricator.wikimedia.org/T158196) [22:38:03] (03CR) 10Dzahn: [C: 04-1] "i would say since my change above is now merged, you can simply edit your own template file for this" [puppet] - 10https://gerrit.wikimedia.org/r/366768 (owner: 10Paladox) [22:40:46] (03PS1) 10Jcrespo: Deprecate multi-source instance on dbstore2002 [puppet] - 10https://gerrit.wikimedia.org/r/366978 (https://phabricator.wikimedia.org/T169514) [22:41:47] (03PS2) 10Jcrespo: mariadb: Deprecate multi-source instance on dbstore2002 [puppet] - 10https://gerrit.wikimedia.org/r/366978 (https://phabricator.wikimedia.org/T169514) [22:42:34] (03PS2) 10Madhuvishy: install_server: Add new partman recipe for labstore100[1-2] [puppet] - 10https://gerrit.wikimedia.org/r/366977 (https://phabricator.wikimedia.org/T158196) [22:42:47] 10Operations, 10media-storage: upload.wikimedia.org needs a Wikimedia 404 error page - https://phabricator.wikimedia.org/T37053#401413 (10Krinkle) Previously: * 404 Not Found for an original: Swift * 404 Not Found for a thumbnail: MediaWiki /w/thumb.php (proxied by Swift) Screenshot (from T113114) > | Swift |... [22:43:35] 10Operations, 10media-storage: upload.wikimedia.org should serve a Wikimedia 404 error page when file not found in Swift - https://phabricator.wikimedia.org/T37053#3462171 (10Krinkle) [22:43:55] (03CR) 10Jcrespo: [C: 032] mariadb: Deprecate multi-source instance on dbstore2002 [puppet] - 10https://gerrit.wikimedia.org/r/366978 (https://phabricator.wikimedia.org/T169514) (owner: 10Jcrespo) [22:49:22] (03PS3) 10Madhuvishy: install_server: Add new partman recipe for labstore100[1-2] [puppet] - 10https://gerrit.wikimedia.org/r/366977 (https://phabricator.wikimedia.org/T158196) [22:49:24] (03CR) 10Jcrespo: [C: 04-1] "This is not enough, we need to set it to infinity." [software] - 10https://gerrit.wikimedia.org/r/365255 (owner: 10Jcrespo) [22:53:21] (03PS4) 10Madhuvishy: install_server: Add new partman recipe for labstore100[1-2] [puppet] - 10https://gerrit.wikimedia.org/r/366977 (https://phabricator.wikimedia.org/T158196) [22:54:18] robh: (or anyone else) can I interest you in a quick review? https://gerrit.wikimedia.org/r/#/c/366977 :) [22:54:50] why swap? [22:55:00] we've been eliminating swap on most of our systems [22:55:32] https://phabricator.wikimedia.org/T156955 [22:55:53] aah - the other labstore boxes - 1004 and 5 have 40G swap on them and 128G RAM or something, was trying to be somewhat consistent [22:56:47] (03CR) 10RobH: "Why include a swap partition at all? We've been moving away from inclusion on most of the cluster, see https://phabricator.wikimedia.org/" [puppet] - 10https://gerrit.wikimedia.org/r/366977 (https://phabricator.wikimedia.org/T158196) (owner: 10Madhuvishy) [22:56:49] these boxes are older with 32G RAM [22:57:04] yes but is the swap used on them at all? [22:57:07] I'd expect no [22:57:28] or else we would have gotten you systems with more memory. swap is not a cost effective use of disk space versus the cost of ram in most of our use cases [22:58:17] and just due to inclusion in the past doesn't mean we should overcomplicate partitioning by including without question =] [22:58:29] so i'd kill the swap unless you have demonstrated that the labstore uses the swap file [22:58:37] (and if it did, then we're not buying enough memory for them ;) [22:58:48] as it is, we're trying to eliminate swap use unless needed [22:58:54] so adding a new recipe with it is working backwards [22:59:03] right - in the 1001/2 case - which is what i'm installing now - they are older, less beefier boxes. I'm not super sure why 1004 and 5 are setup with swap, but they are newer and have much higher RAM. [22:59:16] they have swap by mistake i would imagine [22:59:31] did those 1001/2 use swap in their past lives? id still urge no swpa [22:59:45] unless using SSDs where they are fast enough to warrant offloading to storage rather than ram [22:59:47] its not effective [22:59:58] they had swap setup for sure. [23:00:00] not did they have it in past isntances [23:00:03] did they USE it [23:00:12] most of the fleet had it at one point [23:00:15] but it wasnt used on any of it [23:00:21] I understand. I'm not sure, i'd have to check with Chase when he's back. [23:00:25] ie: physical memory was not the bottleneck [23:00:39] notice i didnt -1 you, im not going to block your work over swap determination ;] [23:00:58] cuz its just something we need to keep in mind and attempt to phase out when not needed [23:01:00] i'm equally curious about swap usage :) no worries! [23:01:08] its not important enough to hold you up if needing the hardware =] [23:01:32] but otherwise that recipe looks good, no telling until you test it though! i often have to push the recipe, then livehack a tweak or two on the install server [23:01:46] once it works, apply new patchset to tweak puppet repo and update puppet on installer, then install with auto partman [23:01:55] install with the puppet version of the auto partman file that is. [23:02:22] (this is why so many folks hate partman and leave their recipes to dc-ops ;) [23:02:51] madhuvishy: when you are ready to actually apply, im happy to assist if needed [23:02:56] yes :) you helped write the last one for me! [23:03:01] i mostly based off of it [23:06:38] (03CR) 10Madhuvishy: "I put in swap for consistency with the previous setup, and the current secondary labstore setup. It's understandable that swap is being ph" [puppet] - 10https://gerrit.wikimedia.org/r/366977 (https://phabricator.wikimedia.org/T158196) (owner: 10Madhuvishy) [23:07:54] (03PS5) 10Madhuvishy: install_server: Add new partman recipe for labstore100[1-2] [puppet] - 10https://gerrit.wikimedia.org/r/366977 (https://phabricator.wikimedia.org/T158196) [23:10:19] (03CR) 10Madhuvishy: [C: 032] install_server: Add new partman recipe for labstore100[1-2] [puppet] - 10https://gerrit.wikimedia.org/r/366977 (https://phabricator.wikimedia.org/T158196) (owner: 10Madhuvishy) [23:17:14] robh: somewhat dumb question - these boxes have been wiped out and re racked, should I follow https://wikitech.wikimedia.org/wiki/Server_Lifecycle#Installation or use the reimage script to do OS installation. https://phabricator.wikimedia.org/T158913#3252214 is last status [23:18:03] if theyve been completely removed from everthing, like puppet and icinga [23:18:09] you can just follow the lifecycle not reimage [23:18:19] no they haven't been removed from puppet and icinga [23:18:34] oh, then reimage is likely easier, ive only used it once or twice [23:18:48] most of my stuff is brand new, so reimage doesnt apply [23:19:03] right, okay - i'll do that then [23:19:45] (03CR) 10Smalyshev: [C: 031] logstash: Parse nginx access logs for wdqs [puppet] - 10https://gerrit.wikimedia.org/r/299825 (owner: 10BryanDavis) [23:37:13] 10Operations, 10Cloud-Services, 10Patch-For-Review, 10cloud-services-team (Kanban): Reimage labstore1001 and labstore1002 for DRBD storage setup - https://phabricator.wikimedia.org/T158196#3029409 (10ops-monitoring-bot) Script wmf_auto_reimage was launched by madhuvishy on neodymium.eqiad.wmnet for hosts:...