[00:00:04] addshore, hashar, anomie, ostriches, aude, MaxSem, twentyafterfour, RoanKattouw, Dereckson, and thcipriani: Dear anthropoid, the time has come. Please deploy Evening SWAT (Max 8 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20170217T0000). [00:10:00] RECOVERY - puppet last run on mc1001 is OK: OK: Puppet is currently enabled, last run 17 seconds ago with 0 failures [00:14:34] 06Operations, 06Discovery, 10Traffic, 10Wikidata, and 2 others: compile number of http uses for http://www.wikidata.org/entity - https://phabricator.wikimedia.org/T154017#3035189 (10Krinkle) [00:28:46] (03Draft1) 10Paladox: Phabricator: Make ssh-phab port configurable [puppet] - 10https://gerrit.wikimedia.org/r/338294 [00:28:49] (03PS2) 10Paladox: Phabricator: Make ssh-phab port configurable [puppet] - 10https://gerrit.wikimedia.org/r/338294 [00:31:06] (03CR) 10Dzahn: [C: 04-1] "why would it not affect prod? Why change the port? And 23 is telnet." [puppet] - 10https://gerrit.wikimedia.org/r/338294 (owner: 10Paladox) [00:31:38] (03CR) 10Paladox: "> why would it not affect prod? Why change the port? And 23 is" [puppet] - 10https://gerrit.wikimedia.org/r/338294 (owner: 10Paladox) [00:56:25] (03CR) 10Dzahn: [C: 04-1] "No, we should no change the setup for labs, and if we really had to do then don't use port 23. The real solution is that the labs instance" [puppet] - 10https://gerrit.wikimedia.org/r/338294 (owner: 10Paladox) [00:56:40] PROBLEM - puppet last run on ms-be1002 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [01:02:07] (03CR) 10Chad: [C: 04-1] "Actually, per my comment on the (now) declined task, we shouldn't do this." [puppet] - 10https://gerrit.wikimedia.org/r/337397 (https://phabricator.wikimedia.org/T158298) (owner: 10Ladsgroup) [01:04:37] (03CR) 1020after4: "@dzahn: do you know how to assign two IPs to a labs instance? I thought a floating IP might work but since that uses NAT it seems like it " [puppet] - 10https://gerrit.wikimedia.org/r/338294 (owner: 10Paladox) [01:04:41] (03CR) 10Chad: [C: 04-1] "Reverse the logic at the very least, default is 22 labs can override in hiera." [puppet] - 10https://gerrit.wikimedia.org/r/338294 (owner: 10Paladox) [01:20:01] (03PS1) 1020after4: sshd-phab service config needs to be a template [puppet] - 10https://gerrit.wikimedia.org/r/338302 [01:22:43] (03CR) 10Paladox: sshd-phab service config needs to be a template (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/338302 (owner: 1020after4) [01:22:50] (03CR) 10Paladox: [C: 031] "Tested and needed :)" [puppet] - 10https://gerrit.wikimedia.org/r/338302 (owner: 1020after4) [01:24:40] RECOVERY - puppet last run on ms-be1002 is OK: OK: Puppet is currently enabled, last run 19 seconds ago with 0 failures [01:43:10] RECOVERY - Juniper alarms on asw-ulsfo.mgmt.ulsfo.wmnet is OK: JNX_ALARMS OK - 0 red alarms, 0 yellow alarms [01:49:50] RECOVERY - Host ripe-atlas-ulsfo is UP: PING OK - Packet loss = 0%, RTA = 78.68 ms [02:00:38] (03PS3) 10Paladox: Phabricator: Make ssh-phab port configurable [puppet] - 10https://gerrit.wikimedia.org/r/338294 [02:23:40] PROBLEM - puppet last run on dbproxy1009 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [02:27:49] !log l10nupdate@tin scap sync-l10n completed (1.29.0-wmf.12) (duration: 06m 49s) [02:27:56] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [02:33:21] !log l10nupdate@tin ResourceLoader cache refresh completed at Fri Feb 17 02:33:21 UTC 2017 (duration 5m 32s) [02:33:27] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [02:45:24] (03PS2) 10Zppix: sshd-phab service config needs to be a template [puppet] - 10https://gerrit.wikimedia.org/r/338302 (owner: 1020after4) [02:47:00] PROBLEM - puppet last run on db1040 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [02:52:40] RECOVERY - puppet last run on dbproxy1009 is OK: OK: Puppet is currently enabled, last run 22 seconds ago with 0 failures [03:15:00] RECOVERY - puppet last run on db1040 is OK: OK: Puppet is currently enabled, last run 4 seconds ago with 0 failures [03:18:00] PROBLEM - puppet last run on elastic1027 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [03:22:40] PROBLEM - MariaDB Slave Lag: s1 on dbstore1002 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 607.89 seconds [03:26:40] RECOVERY - MariaDB Slave Lag: s1 on dbstore1002 is OK: OK slave_sql_lag Replication lag: 115.92 seconds [03:31:50] (03PS4) 10Paladox: Phabricator: Make ssh-phab port configurable [puppet] - 10https://gerrit.wikimedia.org/r/338294 [03:34:23] (03PS5) 10Paladox: Phabricator: Make ssh-phab port configurable [puppet] - 10https://gerrit.wikimedia.org/r/338294 [03:43:40] PROBLEM - puppet last run on californium is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [03:46:00] RECOVERY - puppet last run on elastic1027 is OK: OK: Puppet is currently enabled, last run 47 seconds ago with 0 failures [04:10:40] RECOVERY - puppet last run on californium is OK: OK: Puppet is currently enabled, last run 1 second ago with 0 failures [04:37:50] PROBLEM - puppet last run on cp4003 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [05:04:50] RECOVERY - puppet last run on cp4003 is OK: OK: Puppet is currently enabled, last run 2 seconds ago with 0 failures [06:26:40] PROBLEM - puppet last run on wtp1008 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [06:47:40] PROBLEM - puppet last run on lvs1004 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [06:52:06] (03PS1) 10Marostegui: db-codfw.php: Repool db2070 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/338319 (https://phabricator.wikimedia.org/T156478) [06:55:40] RECOVERY - puppet last run on wtp1008 is OK: OK: Puppet is currently enabled, last run 44 seconds ago with 0 failures [06:56:54] (03CR) 10Marostegui: [C: 032] db-codfw.php: Repool db2070 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/338319 (https://phabricator.wikimedia.org/T156478) (owner: 10Marostegui) [06:58:23] (03Merged) 10jenkins-bot: db-codfw.php: Repool db2070 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/338319 (https://phabricator.wikimedia.org/T156478) (owner: 10Marostegui) [06:58:33] (03CR) 10jenkins-bot: db-codfw.php: Repool db2070 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/338319 (https://phabricator.wikimedia.org/T156478) (owner: 10Marostegui) [06:59:39] !log marostegui@tin Synchronized wmf-config/db-codfw.php: Repool db2070 - T156478 (duration: 00m 48s) [06:59:45] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [06:59:47] T156478: Change rack for servers in s1 in codfw - https://phabricator.wikimedia.org/T156478 [07:02:31] 06Operations, 10ops-codfw, 10DBA, 13Patch-For-Review: Change rack for servers in s1 in codfw - https://phabricator.wikimedia.org/T156478#3035728 (10Marostegui) db2070 has been repooled. Thanks everyone for the help to move all these three servers! [07:02:46] 06Operations, 10ops-codfw, 10DBA, 13Patch-For-Review: Change rack for servers in s1 in codfw - https://phabricator.wikimedia.org/T156478#3035729 (10Marostegui) 05Open>03Resolved [07:15:40] RECOVERY - puppet last run on lvs1004 is OK: OK: Puppet is currently enabled, last run 52 seconds ago with 0 failures [07:26:06] 06Operations, 07Wikimedia-log-errors: Warning: timed out after 0.2 seconds when connecting to rdb1001.eqiad.wmnet [110]: Connection timed out - https://phabricator.wikimedia.org/T125735#3035758 (10Joe) Hi! I'm the one who suggested most of those timeout changes. Some have different historical reasons, but I th... [07:41:22] 06Operations, 06Project-Admins: Operations-related subprojects/tags reorganization - https://phabricator.wikimedia.org/T119944#1840761 (10Nemo_bis) >>! In T119944#2338184, @Aklapper wrote: > For the records, the following projects were changed from yellow tags to blue components lately: > #Diamond, #Elasticsea... [07:41:39] !log installing spice security updates [07:41:43] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [07:58:59] !log upgrading mw1261 to HHVM 3.12.14 [07:59:03] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:00:09] !log installing openssl 1.1.0e updates [08:00:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:09:54] (03PS11) 10Giuseppe Lavagetto: Add schema support [software/conftool] - 10https://gerrit.wikimedia.org/r/288881 (https://phabricator.wikimedia.org/T155823) [08:10:37] (03CR) 10jerkins-bot: [V: 04-1] Add schema support [software/conftool] - 10https://gerrit.wikimedia.org/r/288881 (https://phabricator.wikimedia.org/T155823) (owner: 10Giuseppe Lavagetto) [08:15:22] (03CR) 10Muehlenhoff: [C: 031] "Looks good to me. I remember that this was source of confusion during some server decom, where someone looked in the wrong YAML file, but " [puppet] - 10https://gerrit.wikimedia.org/r/338108 (https://phabricator.wikimedia.org/T156023) (owner: 10Elukey) [08:19:33] !log restarted nginx/prometheus in esams/ulsfo to pick up openssl update [08:19:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:23:04] (03PS12) 10Giuseppe Lavagetto: Add schema support [software/conftool] - 10https://gerrit.wikimedia.org/r/288881 (https://phabricator.wikimedia.org/T155823) [08:35:27] !log restart nginx on prometheus in eqiad/codfw to pick up openssl update [08:35:31] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:35:58] !log upgrading mw1262-mw1265 to HHVM 3.12.14 [08:36:03] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:36:58] (03CR) 10Marostegui: "We can probably go ahead and start deploying this next week as per: https://phabricator.wikimedia.org/T149418#3035817" [puppet] - 10https://gerrit.wikimedia.org/r/335816 (https://phabricator.wikimedia.org/T149418) (owner: 10Marostegui) [08:55:17] (03CR) 10Jcrespo: [C: 031] "You have my +1 long time ago :-)" [puppet] - 10https://gerrit.wikimedia.org/r/335816 (https://phabricator.wikimedia.org/T149418) (owner: 10Marostegui) [09:06:06] (03PS1) 10Muehlenhoff: ldap::client::utils: Move to require_package [puppet] - 10https://gerrit.wikimedia.org/r/338320 [09:13:07] (03PS1) 10Muehlenhoff: Add separate debdeploy server group for install/repository servers [puppet] - 10https://gerrit.wikimedia.org/r/338321 [09:15:18] (03CR) 10Muehlenhoff: [C: 032] Add separate debdeploy server group for install/repository servers [puppet] - 10https://gerrit.wikimedia.org/r/338321 (owner: 10Muehlenhoff) [09:22:52] !log restarting nginx on install1002/2002 to pick up new openssl [09:22:56] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:29:18] 06Operations, 06Project-Admins: Operations-related subprojects/tags reorganization - https://phabricator.wikimedia.org/T119944#3035846 (10jcrespo) @Nemo_bis you are writing on an old, closed, first-step effort. Ops tickets reorganization keeps happening, but we were blocked on technical concerns on phabricator... [09:30:18] !log upgrade nginx on elastic1049-1052 for ssl upgrade [09:30:21] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:32:49] !log upgrade nginx on elasticsearch codfw for ssl upgrade [09:32:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:33:05] !log rolling restart of nginx on mw canaries to pick up openssl update [09:33:09] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:34:22] 06Operations, 13Patch-For-Review: Standardizing our partman recipes - https://phabricator.wikimedia.org/T156955#2990919 (10fgiunchedi) +1 on reducing the number of partman recipes! For swap I couldn't find an answer for recent kernel versions whether or not having swap helps under normal circumstances (IOW wi... [09:50:50] 06Operations, 13Patch-For-Review: Standardizing our partman recipes - https://phabricator.wikimedia.org/T156955#2990919 (10MoritzMuehlenhoff) As for swapping, there was an article on LWN a while ago indicating that swapping is beneficial on systems with modern I/O/SSDs: https://lwn.net/Articles/690079/ [09:54:33] !log rolling restart of nginx on mediawiki servers in codfw to pick up openssl update [09:54:37] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:56:39] (03PS1) 10Marostegui: generate_dsns_table.sh: Add new script for dsns. [software] - 10https://gerrit.wikimedia.org/r/338326 (https://phabricator.wikimedia.org/T154485) [09:56:44] (03CR) 10jerkins-bot: [V: 04-1] generate_dsns_table.sh: Add new script for dsns. [software] - 10https://gerrit.wikimedia.org/r/338326 (https://phabricator.wikimedia.org/T154485) (owner: 10Marostegui) [09:57:08] (03PS1) 10Gehel: elasticsearch - reimage elastic10(17|18|19|20) to jessie and move data to /srv [puppet] - 10https://gerrit.wikimedia.org/r/338327 (https://phabricator.wikimedia.org/T151326) [09:59:00] (03CR) 10Gehel: [C: 032] elasticsearch - reimage elastic10(17|18|19|20) to jessie and move data to /srv [puppet] - 10https://gerrit.wikimedia.org/r/338327 (https://phabricator.wikimedia.org/T151326) (owner: 10Gehel) [09:59:36] (03PS2) 10Marostegui: generate_dsns_table.sh: Add new script for dsns. [software] - 10https://gerrit.wikimedia.org/r/338326 (https://phabricator.wikimedia.org/T154485) [09:59:41] (03CR) 10jerkins-bot: [V: 04-1] generate_dsns_table.sh: Add new script for dsns. [software] - 10https://gerrit.wikimedia.org/r/338326 (https://phabricator.wikimedia.org/T154485) (owner: 10Marostegui) [10:01:21] !log gehel@puppetmaster1001 conftool action : set/pooled=no; selector: name=elastic10(17|18|19|20).eqiad.wmnet [10:01:25] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [10:01:34] (03CR) 10Jcrespo: [C: 031] "Could you add the commit summary as a comment on the file header?- otherwise I will forget what this is in a year's time." [software] - 10https://gerrit.wikimedia.org/r/338326 (https://phabricator.wikimedia.org/T154485) (owner: 10Marostegui) [10:03:44] (03PS3) 10Marostegui: generate_dsns_table.sh: Add new script for dsns. [software] - 10https://gerrit.wikimedia.org/r/338326 (https://phabricator.wikimedia.org/T154485) [10:03:49] (03CR) 10jerkins-bot: [V: 04-1] generate_dsns_table.sh: Add new script for dsns. [software] - 10https://gerrit.wikimedia.org/r/338326 (https://phabricator.wikimedia.org/T154485) (owner: 10Marostegui) [10:03:59] I wonder why is it doing the -V1... [10:04:05] let me help there [10:04:31] please do! :) [10:04:33] 06Operations, 10ops-eqiad, 06Discovery, 06Discovery-Search, and 2 others: rack/setup/install elastic1048-1052 - https://phabricator.wikimedia.org/T155790#3035887 (10ops-monitoring-bot) Script wmf_auto_reimage was launched by gehel on neodymium.eqiad.wmnet for hosts: ``` ['elastic1017.eqiad.wmnet'] ``` The... [10:05:03] (03PS4) 10Jcrespo: generate_dsns_table.sh: Add new script for dsns. [software] - 10https://gerrit.wikimedia.org/r/338326 (https://phabricator.wikimedia.org/T154485) (owner: 10Marostegui) [10:05:08] (03CR) 10jerkins-bot: [V: 04-1] generate_dsns_table.sh: Add new script for dsns. [software] - 10https://gerrit.wikimedia.org/r/338326 (https://phabricator.wikimedia.org/T154485) (owner: 10Marostegui) [10:05:58] strange [10:06:17] what did you think it could be? [10:06:20] I rebased manually [10:06:34] sometimes,for some reason, it fails to rebase automatically [10:07:25] so either the repo is bad [10:07:28] let me try to force it to reexamine the change [10:07:35] 06Operations, 10ops-eqiad, 06Discovery, 06Discovery-Search, and 2 others: rack/setup/install elastic1048-1052 - https://phabricator.wikimedia.org/T155790#3035889 (10ops-monitoring-bot) Script wmf_auto_reimage was launched by gehel on neodymium.eqiad.wmnet for hosts: ``` ['elastic1019.eqiad.wmnet'] ``` The... [10:07:51] (03CR) 10Marostegui: "recheck" [software] - 10https://gerrit.wikimedia.org/r/338326 (https://phabricator.wikimedia.org/T154485) (owner: 10Marostegui) [10:07:56] (03CR) 10jerkins-bot: [V: 04-1] generate_dsns_table.sh: Add new script for dsns. [software] - 10https://gerrit.wikimedia.org/r/338326 (https://phabricator.wikimedia.org/T154485) (owner: 10Marostegui) [10:08:01] weird [10:08:04] or it is the wrong message- it is complaining about the syntax, not the rebase [10:08:21] the syntax works (at least on neodymium) :) [10:08:33] no, I mean those extra spaces, etc. [10:08:37] ah [10:08:57] the thing is, the bot would say why it failed [10:09:19] for now, get rid of the tabs [10:09:22] and that [10:11:03] (03CR) 10Volans: [C: 031] "LGTM. I guess you plan to revert 02129508441abf1dea0feb3d8070ee16e3eb6fd0 after this." [puppet] - 10https://gerrit.wikimedia.org/r/338320 (owner: 10Muehlenhoff) [10:11:24] (03PS5) 10Marostegui: generate_dsns_table.sh: Add new script for dsns. [software] - 10https://gerrit.wikimedia.org/r/338326 (https://phabricator.wikimedia.org/T154485) [10:11:29] (03CR) 10jerkins-bot: [V: 04-1] generate_dsns_table.sh: Add new script for dsns. [software] - 10https://gerrit.wikimedia.org/r/338326 (https://phabricator.wikimedia.org/T154485) (owner: 10Marostegui) [10:12:08] 06Operations: Enhance account handling (meta bug) - https://phabricator.wikimedia.org/T142815#3035899 (10MoritzMuehlenhoff) [10:12:10] 06Operations, 07LDAP: Add wmf LDAP group members into nda group, delete wmf group - https://phabricator.wikimedia.org/T129786#3035897 (10MoritzMuehlenhoff) 05Open>03declined I don't think this is useful. With the current scheme we have the possibility to selectively grant some resources to staff only (as p... [10:13:10] PROBLEM - puppet last run on cp3034 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [10:17:03] there is no way to find that repo on jenkins to examine what is it actually complaining about, or I cannot find it [10:21:29] I would ask hasar or volans, if they changed something about it [10:21:35] 06Operations, 10ops-eqiad, 06Discovery, 06Discovery-Search, and 2 others: rack/setup/install elastic1048-1052 - https://phabricator.wikimedia.org/T155790#3035908 (10ops-monitoring-bot) Completed auto-reimage of hosts: ``` ['elastic1017.eqiad.wmnet'] ``` Of which those **FAILED**: ``` set(['elastic1017.eqi... [10:21:46] marostegui: let me check [10:21:57] Hey volans! Thanks :) [10:22:26] the message is about merge, and it merges with no issue on my side [10:22:36] either there is a validation with the wrong message [10:22:40] marostegui: looks like there was a conflict and the repository operations/software is not fast forward only [10:22:42] or the repo upstream [10:22:53] marostegui, there is not a conflict [10:22:55] allows to merge, but might be that the change was not auto-mergiable, not super sure [10:22:56] I merge locally [10:23:04] I upload [10:23:09] no possible conflicts [10:23:10] that would be really strange as it is a new file [10:23:16] it doesn't matter [10:23:26] I am sure I am merging starting from head [10:23:44] try yourself if you do not beleive me [10:24:29] same from a fresh repo [10:24:58] I just cloned [10:25:09] * volans trying [10:25:45] either there is corruption/problem on cobalt [10:25:53] or the message is wrong [10:25:57] (03PS6) 10Volans: generate_dsns_table.sh: Add new script for dsns. [software] - 10https://gerrit.wikimedia.org/r/338326 (https://phabricator.wikimedia.org/T154485) (owner: 10Marostegui) [10:26:02] (03CR) 10jerkins-bot: [V: 04-1] generate_dsns_table.sh: Add new script for dsns. [software] - 10https://gerrit.wikimedia.org/r/338326 (https://phabricator.wikimedia.org/T154485) (owner: 10Marostegui) [10:26:06] see? [10:26:09] not lying [10:26:10] it might be, the other option is to try to close it and open a new one [10:26:16] I can try that too [10:26:17] 06Operations, 06Operations-Software-Development, 07HHVM, 13Patch-For-Review: Upgrade all mw* servers to debian jessie - https://phabricator.wikimedia.org/T143536#3035925 (10MoritzMuehlenhoff) [10:26:20] if it was relate do tthis change [10:26:28] otherwise yes, local repo issue on jenkins side [10:27:30] let me try to abandon that one and push another one [10:27:34] (03PS1) 10Jcrespo: Testing new commit [software] - 10https://gerrit.wikimedia.org/r/338329 [10:27:38] ah, that too XD [10:27:39] (03CR) 10jerkins-bot: [V: 04-1] Testing new commit [software] - 10https://gerrit.wikimedia.org/r/338329 (owner: 10Jcrespo) [10:27:43] there we go [10:27:43] nope [10:28:22] (03PS1) 10Jcrespo: testing again [software] - 10https://gerrit.wikimedia.org/r/338330 [10:28:28] (03CR) 10jerkins-bot: [V: 04-1] testing again [software] - 10https://gerrit.wikimedia.org/r/338330 (owner: 10Jcrespo) [10:28:35] it is not the commit [10:28:43] (03Abandoned) 10Jcrespo: testing again [software] - 10https://gerrit.wikimedia.org/r/338330 (owner: 10Jcrespo) [10:28:59] maybe a preceding commit failed? [10:29:00] as in, the file [10:29:00] 06Operations, 10ops-eqiad, 06Discovery, 06Discovery-Search, and 2 others: rack/setup/install elastic1048-1052 - https://phabricator.wikimedia.org/T155790#3035928 (10ops-monitoring-bot) Completed auto-reimage of hosts: ``` ['elastic1019.eqiad.wmnet'] ``` and were **ALL** successful. [10:29:07] moritzm, maybe [10:29:10] this happened in the past IIRC [10:29:13] will look at cobalt [10:29:36] (03Abandoned) 10Jcrespo: Testing new commit [software] - 10https://gerrit.wikimedia.org/r/338329 (owner: 10Jcrespo) [10:29:43] (03PS1) 10Marostegui: test commit [software] - 10https://gerrit.wikimedia.org/r/338331 [10:29:49] (03CR) 10jerkins-bot: [V: 04-1] test commit [software] - 10https://gerrit.wikimedia.org/r/338331 (owner: 10Marostegui) [10:29:57] ^ that is just a simply new line on the hosts file [10:30:04] and failed too... [10:30:36] (03Abandoned) 10Marostegui: test commit [software] - 10https://gerrit.wikimedia.org/r/338331 (owner: 10Marostegui) [10:31:15] maybe git is not there? [10:31:32] I'm looking at /srv/gerrit/git/operations/software.git on cobalt [10:38:59] jynus, marostegui it could also be related to yesterday's issue on Zuul I guess [10:39:23] volans: let's ask hashar once he is only too [10:39:27] issue, do you have more info? [10:40:02] or just the known slowdowns ? [10:41:11] no was down, see in #-releng, there was an issue with nodepool of openstack [10:41:19] thanks [10:41:30] not sure if can be related in any way though :) [10:41:37] it shouldn't [10:42:13] RECOVERY - puppet last run on cp3034 is OK: OK: Puppet is currently enabled, last run 44 seconds ago with 0 failures [10:42:16] let me do one last test [10:43:18] but it only impacts that specific repo? [10:43:21] that would be strange no? [10:43:33] if it's a different queue maybe [10:43:54] a git fsck reports only one unreachable commit d6b7fcab405535324a9a3bad67054841e2d030da [10:44:00] i checked zuul and i wasn't able to catch my change (i am not too experienced with it so maybe i just missed it) [10:44:40] (03PS1) 10Muehlenhoff: Move the Diamond NTP collector to ntp::daemon [puppet] - 10https://gerrit.wikimedia.org/r/338333 [10:44:49] (03PS1) 10Jcrespo: Last test [software] - 10https://gerrit.wikimedia.org/r/338334 [10:44:55] (03CR) 10jerkins-bot: [V: 04-1] Last test [software] - 10https://gerrit.wikimedia.org/r/338334 (owner: 10Jcrespo) [10:45:38] (03Abandoned) 10Jcrespo: Last test [software] - 10https://gerrit.wikimedia.org/r/338334 (owner: 10Jcrespo) [10:46:03] PROBLEM - puppet last run on auth1001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [10:54:22] (03PS1) 10Volans: Test commit to debug issue in CI [software] - 10https://gerrit.wikimedia.org/r/338335 [10:54:27] (03CR) 10jerkins-bot: [V: 04-1] Test commit to debug issue in CI [software] - 10https://gerrit.wikimedia.org/r/338335 (owner: 10Volans) [10:54:55] this was a try with a new clone [10:55:38] (03Abandoned) 10Volans: Test commit to debug issue in CI [software] - 10https://gerrit.wikimedia.org/r/338335 (owner: 10Volans) [10:56:14] he, I was there, too [10:56:22] (03CR) 10Muehlenhoff: "PCC: http://puppet-compiler.wmflabs.org/5499/" [puppet] - 10https://gerrit.wikimedia.org/r/338333 (owner: 10Muehlenhoff) [10:56:32] also I tried sending one not rebased on the latest change [10:56:58] I looked at jenkins logs from the UI, nothing obvious there [10:57:52] is it really failing to merge? [10:57:58] what if we force it? [10:58:09] maybe only zuul is failing to merge it? [10:59:07] (03PS7) 10Jcrespo: generate_dsns_table.sh: Add new script for dsns. [software] - 10https://gerrit.wikimedia.org/r/338326 (https://phabricator.wikimedia.org/T154485) (owner: 10Marostegui) [10:59:13] (03CR) 10jerkins-bot: [V: 04-1] generate_dsns_table.sh: Add new script for dsns. [software] - 10https://gerrit.wikimedia.org/r/338326 (https://phabricator.wikimedia.org/T154485) (owner: 10Marostegui) [10:59:16] (03CR) 10Jcrespo: [V: 032 C: 032] generate_dsns_table.sh: Add new script for dsns. [software] - 10https://gerrit.wikimedia.org/r/338326 (https://phabricator.wikimedia.org/T154485) (owner: 10Marostegui) [10:59:42] interesting... [11:00:03] yeah, it is jenkins copy what it is failing [11:00:08] I can see it merged fine on another copy of the repo I have yes [11:00:47] I can see the job beeing scheduled on: [11:00:50] Worker ci-jessie-wikimedia-531875_exec-0 scheduling tox-jessie build #16113 on ci-jessie-wikimedia-531875 with UUID 79cc7d4c9602426298211b9a2516964c [11:01:05] it is "their" copy that fails [11:01:18] sorry wrong paste [11:01:28] Worker ci-jessie-wikimedia-531944_exec-0 scheduling tox-jessie build #16114 on ci-jessie-wikimedia-531944 with UUID 0b34250bf5eb44adbe5bc9c4c2579330 [11:01:31] this is the right one [11:01:53] it merges well, cobalt is ok [11:02:02] clients can see it, etc. [11:02:12] yep, totally [11:02:16] I can see it fine [11:03:44] let's report the issue, move on [11:04:03] I will do that, is that release? [11:04:43] it is zuul/jenkins [11:05:04] ok! I will create the task [11:05:06] cheers! [11:05:09] 06Operations, 06Release-Engineering-Team, 05DC-Switchover-Prep-Q3-2016-17: Understand the preparedness of misc services for datacenter switchover - https://phabricator.wikimedia.org/T156937#3036009 (10jcrespo) [11:05:12] 06Operations, 13Patch-For-Review: Upgrade fluorine to trusty/jessie - https://phabricator.wikimedia.org/T123728#3036008 (10jcrespo) [11:08:41] (03CR) 10Volans: "See inline." (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/338119 (https://phabricator.wikimedia.org/T123728) (owner: 10Filippo Giunchedi) [11:11:43] !log restarting nginx on sodium (mirrors.wikimedia.org) to pick up openssl update [11:11:47] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:13:56] 06Operations, 06Labs: Can't create account "Trizek (WMF)" - https://phabricator.wikimedia.org/T158408#3036058 (10Trizek-WMF) [11:14:03] RECOVERY - puppet last run on auth1001 is OK: OK: Puppet is currently enabled, last run 4 seconds ago with 0 failures [11:15:59] (03PS2) 10Marostegui: .profile: Add .profile file [puppet] - 10https://gerrit.wikimedia.org/r/329133 [11:18:23] (03Abandoned) 10Ladsgroup: gerrit: Make blue buttons look like OOUI [puppet] - 10https://gerrit.wikimedia.org/r/337397 (https://phabricator.wikimedia.org/T158298) (owner: 10Ladsgroup) [11:19:02] 06Operations, 06Labs: Can't create account "Trizek (WMF)" - https://phabricator.wikimedia.org/T158408#3036058 (10MoritzMuehlenhoff) There's no shell account in labs for "trizek-wmf", did you mean "trizek"? [11:19:40] (03CR) 10Jcrespo: [C: 031] .profile: Add .profile file [puppet] - 10https://gerrit.wikimedia.org/r/329133 (owner: 10Marostegui) [11:23:32] (03CR) 10Marostegui: [C: 032] .profile: Add .profile file [puppet] - 10https://gerrit.wikimedia.org/r/329133 (owner: 10Marostegui) [11:24:28] (03CR) 10Marostegui: "https://puppet-compiler.wmflabs.org/5501/" [puppet] - 10https://gerrit.wikimedia.org/r/329133 (owner: 10Marostegui) [11:26:44] PROBLEM - puppet last run on wtp1010 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/home/marostegui/.profile] [11:26:53] PROBLEM - puppet last run on elastic1021 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/home/marostegui/.profile] [11:26:53] PROBLEM - puppet last run on db2064 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/home/marostegui/.profile] [11:27:23] PROBLEM - puppet last run on mw1228 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/home/marostegui/.profile] [11:27:31] (03CR) 10Volans: "Rectification of my previous comment" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/338119 (https://phabricator.wikimedia.org/T123728) (owner: 10Filippo Giunchedi) [11:27:34] checking that [11:27:37] might be race condition [11:28:44] a puppet run fixes it [11:28:53] RECOVERY - puppet last run on db2064 is OK: OK: Puppet is currently enabled, last run 37 seconds ago with 0 failures [11:29:13] RECOVERY - puppet last run on mw1228 is OK: OK: Puppet is currently enabled, last run 8 seconds ago with 0 failures [11:29:43] RECOVERY - puppet last run on wtp1010 is OK: OK: Puppet is currently enabled, last run 16 seconds ago with 0 failures [11:29:53] marostegui: still, it shouldn't, right _joe_ ? [11:30:53] RECOVERY - puppet last run on elastic1021 is OK: OK: Puppet is currently enabled, last run 59 seconds ago with 0 failures [11:30:56] I am checking servers that just ran puppet and they worked fine [11:31:12] seems a race, the file was not yet there althoug was referenced [11:31:38] 06Operations, 06Labs: Can't create account "Trizek (WMF)" - https://phabricator.wikimedia.org/T158408#3036123 (10Trizek-WMF) "trizek" is my volunteer account, which has been very quickly created during a tech workshop. I don't use it for the moment and I prefer to have a separate account for my WMF work. Did... [11:32:11] 06Operations, 06Labs: Can't create account "Trizek (WMF)" - https://phabricator.wikimedia.org/T158408#3036058 (10scfc) I believe this is due to https://wikitech.wikimedia.org/wiki/MediaWiki:Titleblacklist denying account names that contain "(WMF)". So an account "Trizek (WMF)" would probably have to be create... [11:33:48] (03PS5) 10Filippo Giunchedi: udp2log: mirror traffic via udpmirror.py [puppet] - 10https://gerrit.wikimedia.org/r/338119 (https://phabricator.wikimedia.org/T123728) [11:34:10] (03CR) 10Filippo Giunchedi: udp2log: mirror traffic via udpmirror.py (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/338119 (https://phabricator.wikimedia.org/T123728) (owner: 10Filippo Giunchedi) [11:35:31] 06Operations, 06Labs: Can't create account "Trizek (WMF)" - https://phabricator.wikimedia.org/T158408#3036129 (10MoritzMuehlenhoff) Looks like the account blacklist indeed. So either choose a different name or check with Labs Admins whether there's a way to let them create it manually. [11:39:53] PROBLEM - puppet last run on lvs3002 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [11:40:21] 06Operations, 06Analytics-Kanban, 10Traffic, 15User-Elukey: Periodic 500s from piwik.wikimedia.org - https://phabricator.wikimedia.org/T154558#3036155 (10Milimetric) yes, definitely has been going on for a while because we never really looked at it. We assumed the numbers piwik reported made sense because... [11:48:56] (03PS1) 10Marostegui: db-eqiad.php: Remove old comments [mediawiki-config] - 10https://gerrit.wikimedia.org/r/338337 (https://phabricator.wikimedia.org/T153300) [11:49:33] (03PS3) 10Milimetric: Symlink reportupdater output into published-datasets [puppet] - 10https://gerrit.wikimedia.org/r/337672 (https://phabricator.wikimedia.org/T125854) [11:50:06] (03Abandoned) 10Marostegui: Revert "db-eqiad.php: Depool db1028" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/336767 (owner: 10Marostegui) [11:50:44] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Remove old comments [mediawiki-config] - 10https://gerrit.wikimedia.org/r/338337 (https://phabricator.wikimedia.org/T153300) (owner: 10Marostegui) [11:52:51] (03Merged) 10jenkins-bot: db-eqiad.php: Remove old comments [mediawiki-config] - 10https://gerrit.wikimedia.org/r/338337 (https://phabricator.wikimedia.org/T153300) (owner: 10Marostegui) [11:53:09] (03CR) 10jenkins-bot: db-eqiad.php: Remove old comments [mediawiki-config] - 10https://gerrit.wikimedia.org/r/338337 (https://phabricator.wikimedia.org/T153300) (owner: 10Marostegui) [11:54:19] !log marostegui@tin Synchronized wmf-config/db-eqiad.php: Clean up db1028 old comments - T153300 (duration: 00m 41s) [11:54:24] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:54:24] T153300: Remove partitions from metawiki.pagelinks in s7 - https://phabricator.wikimedia.org/T153300 [11:56:15] (03PS1) 10Marostegui: .bashrc: Adding extra space [puppet] - 10https://gerrit.wikimedia.org/r/338339 [11:56:18] (03PS4) 10Milimetric: Symlink reportupdater output to published-datasets [puppet] - 10https://gerrit.wikimedia.org/r/337672 (https://phabricator.wikimedia.org/T125854) [11:57:17] (03PS2) 10Marostegui: .bashrc: Add extra space to PS1 [puppet] - 10https://gerrit.wikimedia.org/r/338339 [12:00:09] (03PS1) 10MarcoAurelio: 'shellmanagers' to 'shellmanager' on wikitech.wikimedia.org [mediawiki-config] - 10https://gerrit.wikimedia.org/r/338340 (https://phabricator.wikimedia.org/T158039) [12:01:53] PROBLEM - Check systemd state on db2070 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [12:02:23] PROBLEM - Check whether ferm is active by checking the default input chain on db2070 is CRITICAL: ERROR ferm input drop default policy not set, ferm might not have been started correctly [12:04:59] (03CR) 10Volans: [C: 031] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/338119 (https://phabricator.wikimedia.org/T123728) (owner: 10Filippo Giunchedi) [12:06:02] was db2070 booted recently? [12:06:13] jynus: same issue as db2062 [12:06:16] I will take care of it [12:06:29] ah, it could be network goes wild for a bit [12:06:36] jynus: yep, looks so [12:06:40] if ip change until puppet runs and then reboot, etc. [12:06:43] (03PS1) 10Marostegui: db-codfw.php: Depool db2070 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/338341 (https://phabricator.wikimedia.org/T156478) [12:07:16] that is good to know, so reimage rebooting is more than justified [12:07:26] and we should do the same when we do not reimage :-) [12:08:03] RECOVERY - puppet last run on lvs3002 is OK: OK: Puppet is currently enabled, last run 43 seconds ago with 0 failures [12:10:44] there's an error in ferm startup; it tried to db1011 and that failed [12:11:09] this sometimes happens during boot, it's a race between the time DNS resolution is ready and ferm startup [12:11:29] I've now started it [12:11:53] RECOVERY - Check systemd state on db2070 is OK: OK - running: The system is fully operational [12:12:23] RECOVERY - Check whether ferm is active by checking the default input chain on db2070 is OK: OK ferm input default policy is set [12:14:14] ah, good thanks!! looking good on icinga now apart from ntp [12:15:17] I will take care of it after lunch [12:15:44] (03CR) 10Marostegui: [C: 032] db-codfw.php: Depool db2070 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/338341 (https://phabricator.wikimedia.org/T156478) (owner: 10Marostegui) [12:17:28] 06Operations, 10ops-codfw: codfw: ms-be2028-ms-be2039 rack/setup - https://phabricator.wikimedia.org/T158337#3036278 (10Aklapper) [12:17:28] (03Merged) 10jenkins-bot: db-codfw.php: Depool db2070 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/338341 (https://phabricator.wikimedia.org/T156478) (owner: 10Marostegui) [12:17:36] (03CR) 10jenkins-bot: db-codfw.php: Depool db2070 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/338341 (https://phabricator.wikimedia.org/T156478) (owner: 10Marostegui) [12:18:27] !log marostegui@tin Synchronized wmf-config/db-codfw.php: Depool db2070 - T156478 (duration: 00m 41s) [12:18:32] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:18:32] T156478: Change rack for servers in s1 in codfw - https://phabricator.wikimedia.org/T156478 [12:21:50] !log marostegui@tin Synchronized wmf-config/db-codfw.php: Depool db2070 - T156478 (duration: 00m 41s) [12:21:55] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:22:00] (03PS1) 10Legoktm: toollabs: Update tools.wmflabs.org links to use HTTPS [puppet] - 10https://gerrit.wikimedia.org/r/338342 [12:43:53] PROBLEM - puppet last run on ganeti1004 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [12:44:23] PROBLEM - Host mw2256 is DOWN: PING CRITICAL - Packet loss = 100% [12:44:53] RECOVERY - Host mw2256 is UP: PING OK - Packet loss = 0%, RTA = 36.20 ms [12:45:05] (03PS1) 10Muehlenhoff: Add some docs on new account attributes [puppet] - 10https://gerrit.wikimedia.org/r/338345 [12:45:43] marostegui: jynys: sorry for operations/software it is a corner case wrong behavior of zuul :( [12:48:01] (03CR) 10Hashar: "recheck" [software] - 10https://gerrit.wikimedia.org/r/338326 (https://phabricator.wikimedia.org/T154485) (owner: 10Marostegui) [12:48:44] (03PS1) 10Tim Landscheidt: Tools: Fix patterns in exec-manage for release versions [puppet] - 10https://gerrit.wikimedia.org/r/338346 [12:48:46] (03PS1) 10Tim Landscheidt: Tools: Let exec-manage fail if it cannot determine release version [puppet] - 10https://gerrit.wikimedia.org/r/338347 [12:49:25] (03CR) 10Hashar: "recheck" [software] - 10https://gerrit.wikimedia.org/r/338326 (https://phabricator.wikimedia.org/T154485) (owner: 10Marostegui) [12:55:28] 06Operations: Enhance account handling (meta bug) - https://phabricator.wikimedia.org/T142815#3036371 (10MoritzMuehlenhoff) [12:55:31] 06Operations, 13Patch-For-Review: Optional expiry date for user accounts - https://phabricator.wikimedia.org/T142816#3036369 (10MoritzMuehlenhoff) 05Open>03Resolved Privileged LDAP and shell accounts can now have an expiry date. Instructions have been added to https://wikitech.wikimedia.org/wiki/Ops_Clini... [12:56:48] (03PS5) 10Rush: labstore: check should search for exact mount match [puppet] - 10https://gerrit.wikimedia.org/r/333230 (https://phabricator.wikimedia.org/T155820) (owner: 10Hashar) [12:56:56] (03CR) 10Rush: [V: 032 C: 032] labstore: check should search for exact mount match [puppet] - 10https://gerrit.wikimedia.org/r/333230 (https://phabricator.wikimedia.org/T155820) (owner: 10Hashar) [12:57:07] chasemp: \O/ [12:57:56] (03PS5) 10Hashar: jenkins: allow access log to be flipped [puppet] - 10https://gerrit.wikimedia.org/r/337385 [12:58:21] 06Operations: Harmonise "Directory Managers" group - https://phabricator.wikimedia.org/T157131#3036392 (10MoritzMuehlenhoff) @faidon and @Andrew ; you are currently the only two non-role members in that group; are you using the group member ship to make LDAP changes or do you typically use cn=admin or tool front... [12:58:35] (03CR) 10Hashar: "I removed the validate_bool(). As Chad said, that does not offer much value." [puppet] - 10https://gerrit.wikimedia.org/r/337385 (owner: 10Hashar) [13:00:09] 06Operations: Harmonise "Directory Managers" group - https://phabricator.wikimedia.org/T157131#3036394 (10faidon) I typically use my own account rather than cn=admin (as to not share a password and provide accountability of who made the changes) but I'm happy to change my ways. [13:04:20] (03PS7) 10Hashar: jenkins: allow changing the web service TCP port [puppet] - 10https://gerrit.wikimedia.org/r/337388 [13:05:14] (03CR) 10jerkins-bot: [V: 04-1] jenkins: allow changing the web service TCP port [puppet] - 10https://gerrit.wikimedia.org/r/337388 (owner: 10Hashar) [13:11:02] (03PS2) 10Hashar: jenkins: add basic specs [puppet] - 10https://gerrit.wikimedia.org/r/337836 [13:11:55] RECOVERY - puppet last run on ganeti1004 is OK: OK: Puppet is currently enabled, last run 4 seconds ago with 0 failures [13:16:54] (03PS6) 10Hashar: jenkins: migrate to systemd [puppet] - 10https://gerrit.wikimedia.org/r/337404 [13:18:41] (03PS8) 10Hashar: jenkins: allow changing the web service TCP port [puppet] - 10https://gerrit.wikimedia.org/r/337388 [13:22:50] (03PS1) 10Rush: labstore: nfs-mount-manager throw notice if symlink [puppet] - 10https://gerrit.wikimedia.org/r/338351 [13:23:37] (03PS2) 10Rush: Tools: Fix patterns in exec-manage for release versions [puppet] - 10https://gerrit.wikimedia.org/r/338346 (owner: 10Tim Landscheidt) [13:23:49] (03CR) 10Rush: [V: 032 C: 032] Tools: Fix patterns in exec-manage for release versions [puppet] - 10https://gerrit.wikimedia.org/r/338346 (owner: 10Tim Landscheidt) [13:23:59] (03CR) 10Rush: [C: 032] Tools: Let exec-manage fail if it cannot determine release version [puppet] - 10https://gerrit.wikimedia.org/r/338347 (owner: 10Tim Landscheidt) [13:24:10] (03PS2) 10Rush: Tools: Let exec-manage fail if it cannot determine release version [puppet] - 10https://gerrit.wikimedia.org/r/338347 (owner: 10Tim Landscheidt) [13:24:17] (03CR) 10Rush: [V: 032 C: 032] Tools: Let exec-manage fail if it cannot determine release version [puppet] - 10https://gerrit.wikimedia.org/r/338347 (owner: 10Tim Landscheidt) [13:24:38] (03PS3) 10Tim Landscheidt: postgresql: Only set user password if different [puppet] - 10https://gerrit.wikimedia.org/r/329328 [13:40:05] (03PS1) 10Marostegui: Revert "db-codfw.php: Depool db2070" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/338354 [13:42:46] (03PS3) 10Marostegui: .bashrc: Add extra space to PS1 [puppet] - 10https://gerrit.wikimedia.org/r/338339 [13:43:44] (03CR) 10Tim Landscheidt: [C: 031] toollabs: Update tools.wmflabs.org links to use HTTPS [puppet] - 10https://gerrit.wikimedia.org/r/338342 (owner: 10Legoktm) [13:44:05] (03CR) 10Marostegui: [C: 032] .bashrc: Add extra space to PS1 [puppet] - 10https://gerrit.wikimedia.org/r/338339 (owner: 10Marostegui) [13:44:39] 06Operations, 10ops-eqiad, 06Discovery, 06Discovery-Search, and 2 others: rack/setup/install elastic1048-1052 - https://phabricator.wikimedia.org/T155790#3036445 (10ops-monitoring-bot) Script wmf_auto_reimage was launched by gehel on neodymium.eqiad.wmnet for hosts: ``` ['elastic1018.eqiad.wmnet'] ``` The... [13:44:59] (03CR) 10Marostegui: [C: 032] Revert "db-codfw.php: Depool db2070" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/338354 (owner: 10Marostegui) [13:45:01] 06Operations, 10ops-eqiad, 06Discovery, 06Discovery-Search, and 2 others: rack/setup/install elastic1048-1052 - https://phabricator.wikimedia.org/T155790#3036447 (10ops-monitoring-bot) Script wmf_auto_reimage was launched by gehel on neodymium.eqiad.wmnet for hosts: ``` ['elastic1020.eqiad.wmnet'] ``` The... [13:46:03] PROBLEM - Check Varnish expiry mailbox lag on cp1072 is CRITICAL: CRITICAL: expiry mailbox lag is 138409 [13:46:23] PROBLEM - puppet last run on mw1205 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [13:46:31] (03Merged) 10jenkins-bot: Revert "db-codfw.php: Depool db2070" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/338354 (owner: 10Marostegui) [13:47:11] (03CR) 10jenkins-bot: Revert "db-codfw.php: Depool db2070" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/338354 (owner: 10Marostegui) [13:47:40] !log marostegui@tin Synchronized wmf-config/db-codfw.php: Repool db2070 - T156478 (duration: 00m 45s) [13:47:46] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:47:46] T156478: Change rack for servers in s1 in codfw - https://phabricator.wikimedia.org/T156478 [14:02:20] (03CR) 10Muehlenhoff: [C: 032] Add some docs on new account attributes [puppet] - 10https://gerrit.wikimedia.org/r/338345 (owner: 10Muehlenhoff) [14:02:26] (03PS2) 10Muehlenhoff: Add some docs on new account attributes [puppet] - 10https://gerrit.wikimedia.org/r/338345 [14:02:58] (03PS1) 10Giuseppe Lavagetto: package_builder: install debian-keyring [puppet] - 10https://gerrit.wikimedia.org/r/338357 [14:03:30] (03CR) 10Giuseppe Lavagetto: [V: 032 C: 032] package_builder: install debian-keyring [puppet] - 10https://gerrit.wikimedia.org/r/338357 (owner: 10Giuseppe Lavagetto) [14:04:05] (03PS3) 10Muehlenhoff: Add some docs on new account attributes [puppet] - 10https://gerrit.wikimedia.org/r/338345 [14:06:03] RECOVERY - Check Varnish expiry mailbox lag on cp1072 is OK: OK: expiry mailbox lag is 11225 [14:06:50] (03Abandoned) 10Hashar: (WIP) contint: Sonatype Nexus (WIP) [puppet] - 10https://gerrit.wikimedia.org/r/314751 (https://phabricator.wikimedia.org/T147635) (owner: 10Hashar) [14:11:55] moritzm: seems you have one unmerged change on puppetmaster (seems docs only) want me to merge? [14:13:02] (03PS7) 10Hashar: contint: move from /mnt to /srv [puppet] - 10https://gerrit.wikimedia.org/r/312523 (https://phabricator.wikimedia.org/T146381) [14:13:04] (03PS2) 10Hashar: Migrate puppet compiler instance from /mnt to /srv [puppet] - 10https://gerrit.wikimedia.org/r/330412 (https://phabricator.wikimedia.org/T146381) [14:13:13] moritzm: it is just a readme... I'm merging it [14:15:23] RECOVERY - puppet last run on mw1205 is OK: OK: Puppet is currently enabled, last run 44 seconds ago with 0 failures [14:17:52] (03PS1) 10Muehlenhoff: Update to 4.4.49 [debs/linux44] - 10https://gerrit.wikimedia.org/r/338358 [14:18:03] 06Operations, 06Labs, 10wikitech.wikimedia.org: Can't create account "Trizek (WMF)" - https://phabricator.wikimedia.org/T158408#3036496 (10MarcoAurelio) Not sure if it applies to wikitech but the global titleblacklist at Meta do also block such usernames from creation. This however can be bypassed by any use... [14:18:05] gehel: sorry, yes please [14:18:14] moritzm: done [14:19:42] 06Operations, 10ops-eqiad, 06Discovery, 06Discovery-Search, and 2 others: rack/setup/install elastic1048-1052 - https://phabricator.wikimedia.org/T155790#3036501 (10ops-monitoring-bot) Completed auto-reimage of hosts: ``` ['elastic1020.eqiad.wmnet'] ``` and were **ALL** successful. [14:20:08] (03CR) 10Hashar: "Giuseppe, I would like to migrate the puppet compiler from /mnt to using /srv. I already migrated all other CI slaves by cherry-picking ht" [puppet] - 10https://gerrit.wikimedia.org/r/330412 (https://phabricator.wikimedia.org/T146381) (owner: 10Hashar) [14:24:35] (03Abandoned) 10Hashar: Initial debianization [debs/geckodriver] - 10https://gerrit.wikimedia.org/r/294293 (https://phabricator.wikimedia.org/T137797) (owner: 10Hashar) [14:25:03] PROBLEM - Check Varnish expiry mailbox lag on cp1072 is CRITICAL: CRITICAL: expiry mailbox lag is 179620 [14:26:03] PROBLEM - Check Varnish expiry mailbox lag on cp1074 is CRITICAL: CRITICAL: expiry mailbox lag is 234199 [14:28:15] (03PS3) 10Hashar: wmflib: os_version now fail when lsb vars are missing [puppet] - 10https://gerrit.wikimedia.org/r/308882 [14:38:30] (03PS6) 10Filippo Giunchedi: udp2log: mirror traffic via udpmirror.py [puppet] - 10https://gerrit.wikimedia.org/r/338119 (https://phabricator.wikimedia.org/T123728) [14:42:03] (03PS1) 10Muehlenhoff: Make realname optional in account check script [puppet] - 10https://gerrit.wikimedia.org/r/338362 [14:45:18] (03CR) 10Filippo Giunchedi: [V: 032 C: 032] udp2log: mirror traffic via udpmirror.py [puppet] - 10https://gerrit.wikimedia.org/r/338119 (https://phabricator.wikimedia.org/T123728) (owner: 10Filippo Giunchedi) [14:48:05] (03PS1) 10Volans: Fix absolute path and remove override defaults [software/cumin] - 10https://gerrit.wikimedia.org/r/338363 (https://phabricator.wikimedia.org/T154588) [14:48:09] <_joe_> !log uploaded clustershell 1.7.3, tqdm, pyparsing to jessie-wikimedia in preparation for cumin [14:48:13] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:48:20] <_joe_> volans: ^^ {{done}} [14:48:50] _joe_: thanks a lot for volounteering for it :) [14:51:37] (03PS1) 10Muehlenhoff: Reorder check for timesyncd or ntpd [puppet] - 10https://gerrit.wikimedia.org/r/338364 [14:52:15] (03PS1) 10Filippo Giunchedi: udp2log: add python3, fix udpmirror perms [puppet] - 10https://gerrit.wikimedia.org/r/338365 [14:53:29] (03PS2) 10Eevans: Revert "Enable Prometheus exporter on restbase1007 (canary)" [puppet] - 10https://gerrit.wikimedia.org/r/338010 (https://phabricator.wikimedia.org/T155120) [14:54:48] (03CR) 10Filippo Giunchedi: [V: 032 C: 032] udp2log: add python3, fix udpmirror perms [puppet] - 10https://gerrit.wikimedia.org/r/338365 (owner: 10Filippo Giunchedi) [14:56:28] (03CR) 10Giuseppe Lavagetto: [C: 031] Fix absolute path and remove override defaults [software/cumin] - 10https://gerrit.wikimedia.org/r/338363 (https://phabricator.wikimedia.org/T154588) (owner: 10Volans) [14:58:47] (03PS2) 10Muehlenhoff: Make realname optional in account check script [puppet] - 10https://gerrit.wikimedia.org/r/338362 [15:03:35] (03CR) 10Muehlenhoff: [C: 032] Make realname optional in account check script [puppet] - 10https://gerrit.wikimedia.org/r/338362 (owner: 10Muehlenhoff) [15:05:46] (03CR) 10Filippo Giunchedi: "PCC https://puppet-compiler.wmflabs.org/5503/" [puppet] - 10https://gerrit.wikimedia.org/r/333247 (owner: 10Filippo Giunchedi) [15:10:22] (03PS3) 10Filippo Giunchedi: Revert "Enable Prometheus exporter on restbase1007 (canary)" [puppet] - 10https://gerrit.wikimedia.org/r/338010 (https://phabricator.wikimedia.org/T155120) (owner: 10Eevans) [15:11:02] ACKNOWLEDGEMENT - Check Varnish expiry mailbox lag on cp1072 is CRITICAL: CRITICAL: expiry mailbox lag is 567613 Brandon Black These are 0.5-1 day from natural restart. No 503 impact yet, but pattern looks scary. Alert thresholds still experimental, will monitor situation today. [15:11:02] ACKNOWLEDGEMENT - Check Varnish expiry mailbox lag on cp1074 is CRITICAL: CRITICAL: expiry mailbox lag is 635553 Brandon Black These are 0.5-1 day from natural restart. No 503 impact yet, but pattern looks scary. Alert thresholds still experimental, will monitor situation today. [15:12:27] PROBLEM - puppet last run on restbase1010 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:13:23] (03CR) 10Filippo Giunchedi: [C: 032] Revert "Enable Prometheus exporter on restbase1007 (canary)" [puppet] - 10https://gerrit.wikimedia.org/r/338010 (https://phabricator.wikimedia.org/T155120) (owner: 10Eevans) [15:17:16] 06Operations, 10ops-eqiad, 10DBA: Replace BBU for db1060 - https://phabricator.wikimedia.org/T158194#3036566 (10Marostegui) @Cmjohnson sorry to push, but were you able to see if there's a replacement BBU? I wouldn't like to leave the server with WriteBack forced without the BBU as we might lose data if there... [15:17:24] (03CR) 10Volans: [C: 032] Fix absolute path and remove override defaults [software/cumin] - 10https://gerrit.wikimedia.org/r/338363 (https://phabricator.wikimedia.org/T154588) (owner: 10Volans) [15:18:07] PROBLEM - puppet last run on mw1267 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:19:53] 06Operations, 06Labs, 06Release-Engineering-Team: contintcloud project thinks it is using 206 fixed-ip quota errantly - https://phabricator.wikimedia.org/T158350#3036573 (10Andrew) thanks for troubleshooting -- I'll dig in the source and try to see how it's computing that quota count. [15:21:08] (03Merged) 10jenkins-bot: Fix absolute path and remove override defaults [software/cumin] - 10https://gerrit.wikimedia.org/r/338363 (https://phabricator.wikimedia.org/T154588) (owner: 10Volans) [15:26:08] !log T155120: Restarting Cassandra on restbase1007-a.eqiad.wmnet to disable Prometheus exporter agent [15:26:12] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:26:13] T155120: Enable Prometheus metrics export for Cassandra - https://phabricator.wikimedia.org/T155120 [15:29:49] (03PS2) 10Hashar: syntax: ignore stdlib Puppet 4 manifests [puppet] - 10https://gerrit.wikimedia.org/r/338143 [15:30:09] (03PS3) 10Hashar: syntax: ignore stdlib Puppet 4 manifests [puppet] - 10https://gerrit.wikimedia.org/r/338143 (https://phabricator.wikimedia.org/T154894) [15:32:43] may one please land the above patch to prevent syntax check to fail on the stdlib module please ? [15:32:46] it uses puppet 4 manifests [15:33:00] when we use puppet 3 so that explodes whenever one runs bundle exec rake syntax:manifest [15:40:27] RECOVERY - puppet last run on restbase1010 is OK: OK: Puppet is currently enabled, last run 25 seconds ago with 0 failures [15:45:07] RECOVERY - puppet last run on mw1267 is OK: OK: Puppet is currently enabled, last run 2 seconds ago with 0 failures [15:52:09] (03CR) 10Muehlenhoff: [C: 031] "Looks good to me" [puppet] - 10https://gerrit.wikimedia.org/r/330436 (https://phabricator.wikimedia.org/T154588) (owner: 10Volans) [15:52:47] PROBLEM - puppet last run on labvirt1007 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:54:24] (03PS1) 10Faidon Liambotis: Update check_timedatectl from dsa-nagios [puppet] - 10https://gerrit.wikimedia.org/r/338373 [15:56:09] (03CR) 10Faidon Liambotis: [C: 032] Update check_timedatectl from dsa-nagios [puppet] - 10https://gerrit.wikimedia.org/r/338373 (owner: 10Faidon Liambotis) [16:00:27] PROBLEM - puppet last run on ms-be1005 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [16:00:42] (03CR) 10VolkerE: ";) Your enthusiasm is highly appreciated." [puppet] - 10https://gerrit.wikimedia.org/r/337397 (https://phabricator.wikimedia.org/T158298) (owner: 10Ladsgroup) [16:02:23] (03PS1) 10Volans: Add debian/ directory for packaging [software/cumin] (debian) - 10https://gerrit.wikimedia.org/r/338374 (https://phabricator.wikimedia.org/T154588) [16:03:09] (03CR) 10jerkins-bot: [V: 04-1] Add debian/ directory for packaging [software/cumin] (debian) - 10https://gerrit.wikimedia.org/r/338374 (https://phabricator.wikimedia.org/T154588) (owner: 10Volans) [16:06:38] (03PS2) 10Volans: Add debian/ directory for packaging [software/cumin] (debian) - 10https://gerrit.wikimedia.org/r/338374 (https://phabricator.wikimedia.org/T154588) [16:09:41] 06Operations, 10ops-codfw: codfw: ms-be2028-ms-be2039 rack/setup - https://phabricator.wikimedia.org/T158337#3036718 (10Papaul) @fgiunchedi ye we do have a lot room to run both old and new server in parallel [16:11:58] 06Operations, 07Puppet, 10Horizon, 06Labs, 13Patch-For-Review: Puppet tab in Horizon unusably slow - https://phabricator.wikimedia.org/T149589#3036720 (10scfc) AFAIUI, https://horizon.wikimedia.org/ has been updated to Mitaka which shows all roles, regardless of `filtertags`. Clicking on a Puppet tab no... [16:15:50] !log restarting cp1074 varnish backend (cron due in 24h, but rep lag looks pretty bad) [16:15:54] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:15:57] PROBLEM - puppet last run on analytics1035 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [16:16:49] 06Operations: Switch to predictable network interface names? - https://phabricator.wikimedia.org/T158429#3036728 (10faidon) [16:16:57] heh "rep lag" so automatic, meant "mb lag" :) [16:18:33] 06Operations, 07Puppet, 10Horizon, 06Labs, 13Patch-For-Review: Puppet tab in Horizon unusably slow - https://phabricator.wikimedia.org/T149589#3036740 (10Paladox) Using th material skin still takes along time to load this tab. So some how the performance improvements weren't done for that skin. [16:19:07] bblack i edited the log and fixed it for you [16:19:12] (03CR) 10Volans: "Test build can be inspected on copper." (031 comment) [software/cumin] (debian) - 10https://gerrit.wikimedia.org/r/338374 (https://phabricator.wikimedia.org/T154588) (owner: 10Volans) [16:20:29] Zppix: thanks :) [16:20:34] bblack no problem [16:21:47] RECOVERY - puppet last run on labvirt1007 is OK: OK: Puppet is currently enabled, last run 53 seconds ago with 0 failures [16:25:57] RECOVERY - Check Varnish expiry mailbox lag on cp1074 is OK: OK: expiry mailbox lag is 0 [16:26:25] (03CR) 10Faidon Liambotis: [C: 04-1] Add debian/ directory for packaging (0311 comments) [software/cumin] (debian) - 10https://gerrit.wikimedia.org/r/338374 (https://phabricator.wikimedia.org/T154588) (owner: 10Volans) [16:29:27] RECOVERY - puppet last run on ms-be1005 is OK: OK: Puppet is currently enabled, last run 43 seconds ago with 0 failures [16:32:57] PROBLEM - puppet last run on rcs1002 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [16:33:27] PROBLEM - puppet last run on labservices1002 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[/usr/local/bin/labs-ip-alias-dump.py] [16:35:07] RECOVERY - Check Varnish expiry mailbox lag on cp1072 is OK: OK: expiry mailbox lag is 13396 [16:35:20] power nap desperately needed, back in a little while [16:35:45] 06Operations, 10ops-codfw: codfw: ms-be2028-ms-be2039 rack/setup - https://phabricator.wikimedia.org/T158337#3036785 (10Papaul) p:05Triage>03Normal [16:39:50] 06Operations, 06Labs, 06Release-Engineering-Team: contintcloud project thinks it is using 206 fixed-ip quota errantly - https://phabricator.wikimedia.org/T158350#3036804 (10Andrew) Usually you can force quota recalculation with MariaDB [nova]> select * from quota_usages where project_id='contintcloud'; In... [16:40:28] jouncebot next [16:40:56] hmm... jouncebot? [16:41:04] jouncebot help [16:41:04] **** JounceBot Help **** [16:41:04] JounceBot is a deployment helper bot for the Wikimedia Foundation. [16:41:04] You can find my source at https://github.com/mattofak/jouncebot [16:41:04] Available commands: [16:41:04] DIE Kill this bot [16:41:05] HELP Prints the list of all commands known to the server [16:41:05] NEXT Get the next deployment event(s if they happen at the same time) [16:41:06] NOW Get the current deployment event(s) or the time until the next [16:41:06] REFRESH Refresh my knowledge about deployments [16:41:17] weird [16:41:39] sigh, spam the help text to everyone? [16:41:51] lol [16:41:57] godog the next command wasnt working i was confirming was functioning [16:42:02] 06Operations, 10ops-codfw, 06DC-Ops, 10hardware-requests: decom install2001 - https://phabricator.wikimedia.org/T157840#3036808 (10Papaul) a:05Papaul>03RobH The server has been removed from rack and added to decommission sheet. [16:42:37] 06Operations, 06Labs, 10wikitech.wikimedia.org: Can't create account "Trizek (WMF)" - https://phabricator.wikimedia.org/T158408#3036058 (10bd808) @Trizek-WMF, I or any other Wikitech admin can make you an account that bypasses the title blacklist rules if you really want it. Typically we don't require or enc... [16:43:19] Zppix: yeah I wasn't pointing fingers, more like observing that only who asks for help is interested in the wall of text back [16:44:57] RECOVERY - puppet last run on analytics1035 is OK: OK: Puppet is currently enabled, last run 23 seconds ago with 0 failures [16:45:05] godog i would of done it in pm but jouncebot doesnt see pms as far as i know [16:45:07] jouncebot next [16:45:13] weird [16:45:25] is there no data for jouncebot to read or something? [16:45:58] jouncebot: refresh [16:46:01] I refreshed my knowledge about deployments. [16:46:04] jouncebot: next [16:46:18] sad_trombone.wav [16:46:29] nope.wav [16:46:48] 06Operations, 06Labs, 10wikitech.wikimedia.org: Can't create account "Trizek (WMF)" - https://phabricator.wikimedia.org/T158408#3036838 (10Trizek-WMF) >>! In T158408#3036810, @bd808 wrote: > @Trizek-WMF, I or any other Wikitech admin can make you an account that bypasses the title blacklist rules if you real... [16:51:24] (03PS1) 10Volans: Moved config.yaml to a doc/examples/ directory [software/cumin] - 10https://gerrit.wikimedia.org/r/338382 (https://phabricator.wikimedia.org/T154588) [16:51:29] (03PS1) 10Faidon Liambotis: mirrors: update archvsync to 20170204 [puppet] - 10https://gerrit.wikimedia.org/r/338383 [16:52:31] (03PS1) 10Hashar: Introduce linters using rake [puppet/cdh4] - 10https://gerrit.wikimedia.org/r/338384 (https://phabricator.wikimedia.org/T154894) [16:52:35] (03PS1) 10Hashar: Introduce linters using rake [puppet/kafka] - 10https://gerrit.wikimedia.org/r/338385 (https://phabricator.wikimedia.org/T154894) [16:52:37] (03PS1) 10Hashar: Introduce linters using rake [puppet/nginx] - 10https://gerrit.wikimedia.org/r/338386 (https://phabricator.wikimedia.org/T154894) [16:52:39] (03PS1) 10Hashar: Introduce linters using rake [puppet/wikimetrics] - 10https://gerrit.wikimedia.org/r/338387 (https://phabricator.wikimedia.org/T154894) [16:56:56] 06Operations, 10Phabricator, 06Release-Engineering-Team: Phabricator: Make sure phabricator works properly including our puppet roles on jessie - https://phabricator.wikimedia.org/T158434#3036857 (10Paladox) [16:57:07] c'mon jenkins... I'm waiting for you :) [16:57:25] 06Operations, 06Project-Admins: Operations-related subprojects/tags reorganization - https://phabricator.wikimedia.org/T119944#3036869 (10Aklapper) >>! In T119944#3035777, @Nemo_bis wrote: > can some/all of them become [[https://www.mediawiki.org/wiki/Phabricator/Project_management#Parent_Projects.2C_Subprojec... [16:58:32] (03CR) 10Volans: [C: 032] Moved config.yaml to a doc/examples/ directory [software/cumin] - 10https://gerrit.wikimedia.org/r/338382 (https://phabricator.wikimedia.org/T154588) (owner: 10Volans) [16:59:22] (03Merged) 10jenkins-bot: Moved config.yaml to a doc/examples/ directory [software/cumin] - 10https://gerrit.wikimedia.org/r/338382 (https://phabricator.wikimedia.org/T154588) (owner: 10Volans) [17:00:57] RECOVERY - puppet last run on rcs1002 is OK: OK: Puppet is currently enabled, last run 1 second ago with 0 failures [17:01:20] 06Operations, 10Phabricator, 06Release-Engineering-Team: Phabricator: Make sure phabricator works properly including our puppet roles on jessie - https://phabricator.wikimedia.org/T158434#3036884 (10Paladox) [17:02:27] RECOVERY - puppet last run on labservices1002 is OK: OK: Puppet is currently enabled, last run 32 seconds ago with 0 failures [17:02:52] (03PS3) 10Volans: Add debian/ directory for packaging [software/cumin] (debian) - 10https://gerrit.wikimedia.org/r/338374 (https://phabricator.wikimedia.org/T154588) [17:04:00] (03CR) 10Hashar: "That will let you easily reproduce what CI does by simply running:" [puppet/nginx] - 10https://gerrit.wikimedia.org/r/338386 (https://phabricator.wikimedia.org/T154894) (owner: 10Hashar) [17:04:19] (03PS4) 10Volans: Add debian/ directory for packaging [software/cumin] (debian) - 10https://gerrit.wikimedia.org/r/338374 (https://phabricator.wikimedia.org/T154588) [17:05:23] volans: do you want a Jenkins job that builds the deb package on patch proposal ? :} [17:06:25] hashar: I think is a more general topic, given that as of now I'll have anyway to rebuild it on copper later [17:07:18] 06Operations, 10hardware-requests: spare ex4200s - check on quantity for potential shipment to OIT - https://phabricator.wikimedia.org/T157839#3036892 (10RobH) 05Open>03Resolved So just 3 switches won't help them, and we cannot steal more spares than that, so they won't need any. [17:07:26] volans: then if you get built for you automatically on each patchset, it would help making sure merged patches will build properly on copper later on :} [17:07:53] volans: I will add it as a non-voting one (will not vote verified -1) and you can gauge whether it is any helpful [17:08:11] ah if it's already done, sure, why not [17:08:16] I though you had to work on it [17:08:35] does it uses WIKIMEDIA=yes also? [17:08:37] 06Operations, 06Labs, 10wikitech.wikimedia.org: Can't create account "Trizek (WMF)" - https://phabricator.wikimedia.org/T158408#3036895 (10bd808) >>! In T158408#3036838, @Trizek-WMF wrote: > I prefer to have separate accounts, like I've done for all other accounts. Why is it not encouraged? The technical co... [17:08:45] it uses whatever is in the debian/changelog [17:08:55] no, I mean to build it [17:09:00] so if you get an entry for jessie-wikimedia , yeah that will set WIKIMEDIA=yes [17:09:06] ok [17:09:19] alexandros wrote a bunch of pbuilder hook that recognizes the -wikimedia prefix and set the variable for us automagically [17:09:43] https://gerrit.wikimedia.org/r/#/c/338390/1/zuul/layout.yaml :} [17:09:51] might have to adjust it for the branches though [17:11:52] I'm getting random SErver Unavailable on gerrit [17:12:17] (03CR) 10Hashar: "recheck" [software/cumin] (debian) - 10https://gerrit.wikimedia.org/r/338374 (https://phabricator.wikimedia.org/T154588) (owner: 10Volans) [17:12:29] hashar: thanks, actually should fail as of now :) [17:12:48] indeed https://integration.wikimedia.org/ci/job/debian-glue-non-voting/667/console [17:12:52] 00:00:11.480 gbp:debug: ['git', 'ls-tree', '0.0.1'] [17:12:52] 00:00:11.491 gbp:error: 0.0.1 is not a valid treeish [17:12:53] :( [17:12:56] yep [17:13:04] "expected" ;) [17:13:07] 06Operations, 10ops-codfw: codfw: ms-be2028-ms-be2039 rack/setup - https://phabricator.wikimedia.org/T158337#3036900 (10Papaul) [17:13:11] 06Operations, 10ops-eqiad, 06Discovery, 06Discovery-Search, and 2 others: rack/setup/install elastic1048-1052 - https://phabricator.wikimedia.org/T155790#3036899 (10ops-monitoring-bot) Script wmf_auto_reimage was launched by gehel on neodymium.eqiad.wmnet for hosts: ``` ['elastic1021.eqiad.wmnet'] ``` The... [17:13:11] in the sense I'm fixing it [17:13:13] which get fixed by setting upstream/0.0.1 tag [17:13:18] yep [17:13:19] or tweaking debian/gpb.conf somehow :} [17:13:38] so the job basically relies on the cowbuilder images provided by puppet [17:13:47] they are updated via cron on a daily basis [17:13:53] then it clones the repo / checkout the patch [17:14:02] and runs cowbuilder with Alexandros magic hooks [17:14:15] and using whatever dist is mentionned in debian/changelog [17:14:37] ok like copper [17:15:50] (03PS1) 10Gehel: elasticsearch - reimage elastic10(21|22|23|24) to jessie and move data to /srv [puppet] - 10https://gerrit.wikimedia.org/r/338392 (https://phabricator.wikimedia.org/T151326) [17:17:07] !log gehel@puppetmaster1001 conftool action : set/pooled=yes; selector: name=elastic10(17|18|19|20).eqiad.wmnet [17:17:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:17:19] volans: yeah that is the idea [17:17:31] maybe one day we will manage to build the package automatically when a tag is pushed [17:17:35] !log gehel@puppetmaster1001 conftool action : set/pooled=no; selector: name=elastic10(21|22|23|24).eqiad.wmnet [17:17:38] and publish the deb on some staging aptrepo [17:17:38] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:17:46] we do that for scap.deb already [17:17:51] (03CR) 10Gehel: [C: 032] elasticsearch - reimage elastic10(21|22|23|24) to jessie and move data to /srv [puppet] - 10https://gerrit.wikimedia.org/r/338392 (https://phabricator.wikimedia.org/T151326) (owner: 10Gehel) [17:19:16] 06Operations, 10ops-eqiad, 06Discovery, 06Discovery-Search, and 2 others: rack/setup/install elastic1048-1052 - https://phabricator.wikimedia.org/T155790#3036916 (10ops-monitoring-bot) Script wmf_auto_reimage was launched by gehel on neodymium.eqiad.wmnet for hosts: ``` ['elastic1022.eqiad.wmnet'] ``` The... [17:22:34] 06Operations, 10ops-eqiad, 06Discovery, 06Discovery-Search, and 2 others: rack/setup/install elastic1048-1052 - https://phabricator.wikimedia.org/T155790#3036919 (10ops-monitoring-bot) Script wmf_auto_reimage was launched by gehel on neodymium.eqiad.wmnet for hosts: ``` ['elastic1024.eqiad.wmnet'] ``` The... [17:28:57] (03CR) 10Volans: "recheck" (0311 comments) [software/cumin] (debian) - 10https://gerrit.wikimedia.org/r/338374 (https://phabricator.wikimedia.org/T154588) (owner: 10Volans) [17:34:32] (03CR) 10Volans: "recheck" [software/cumin] (debian) - 10https://gerrit.wikimedia.org/r/338374 (https://phabricator.wikimedia.org/T154588) (owner: 10Volans) [17:36:50] 06Operations, 10ops-eqiad, 06Discovery, 06Discovery-Search, and 2 others: rack/setup/install elastic1048-1052 - https://phabricator.wikimedia.org/T155790#3036927 (10ops-monitoring-bot) Completed auto-reimage of hosts: ``` ['elastic1021.eqiad.wmnet'] ``` and were **ALL** successful. [17:40:13] 06Operations, 10Phabricator, 06Release-Engineering-Team: Phabricator: Make sure phabricator works properly including our puppet roles on jessie - https://phabricator.wikimedia.org/T158434#3036948 (10Paladox) p:05Triage>03High [17:41:39] 06Operations, 10ops-eqiad, 06Discovery, 06Discovery-Search, and 2 others: rack/setup/install elastic1048-1052 - https://phabricator.wikimedia.org/T155790#3036949 (10ops-monitoring-bot) Completed auto-reimage of hosts: ``` ['elastic1022.eqiad.wmnet'] ``` and were **ALL** successful. [17:42:38] 06Operations, 10ops-eqiad, 10Phabricator, 06Release-Engineering-Team, 10hardware-requests: replacement hardware for iridium (phabricator) - https://phabricator.wikimedia.org/T156970#3036950 (10Paladox) [18:07:44] 06Operations, 06Labs, 06Release-Engineering-Team: contintcloud project thinks it is using 206 fixed-ip quota errantly - https://phabricator.wikimedia.org/T158350#3036994 (10Andrew) I restarted nova-network and it looks like nova is cleaning up those leaks now. I'll keep an eye out, but I've reduced the quot... [18:15:48] (03PS5) 10Volans: Add debian/ directory for packaging [software/cumin] (debian) - 10https://gerrit.wikimedia.org/r/338374 (https://phabricator.wikimedia.org/T154588) [18:23:20] 06Operations, 06Labs, 10netops: asw-c2-eqiad reboots & fdb_mac_entry_mc_set() issues - https://phabricator.wikimedia.org/T155875#3037010 (10faidon) 05Open>03Resolved a:03faidon The "Sanity Checks Failed" log messages continue to happen sporadically but we haven't had a switch failure in over 3 weeks no... [18:25:51] 06Operations, 10ops-eqiad, 06Discovery, 06Discovery-Search, and 2 others: rack/setup/install elastic1048-1052 - https://phabricator.wikimedia.org/T155790#3037014 (10ops-monitoring-bot) Completed auto-reimage of hosts: ``` ['elastic1024.eqiad.wmnet'] ``` and were **ALL** successful. [18:34:08] !log gehel@puppetmaster1001 conftool action : set/pooled=yes; selector: name=elastic10(21|22|24).eqiad.wmnet [18:34:13] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:38:49] (03CR) 10BryanDavis: [C: 031] toollabs: Update tools.wmflabs.org links to use HTTPS [puppet] - 10https://gerrit.wikimedia.org/r/338342 (owner: 10Legoktm) [18:40:38] (03PS1) 10Andrew Bogott: Nova: Turn off Verbose logging [puppet] - 10https://gerrit.wikimedia.org/r/338397 (https://phabricator.wikimedia.org/T158350) [18:46:45] (03PS2) 10Andrew Bogott: toollabs: Update tools.wmflabs.org links to use HTTPS [puppet] - 10https://gerrit.wikimedia.org/r/338342 (owner: 10Legoktm) [18:47:23] (03CR) 10Rush: [C: 031] Nova: Turn off Verbose logging [puppet] - 10https://gerrit.wikimedia.org/r/338397 (https://phabricator.wikimedia.org/T158350) (owner: 10Andrew Bogott) [18:50:44] (03CR) 10Andrew Bogott: [C: 032] toollabs: Update tools.wmflabs.org links to use HTTPS [puppet] - 10https://gerrit.wikimedia.org/r/338342 (owner: 10Legoktm) [18:51:04] (03PS2) 10Andrew Bogott: Nova: Turn off Verbose logging [puppet] - 10https://gerrit.wikimedia.org/r/338397 (https://phabricator.wikimedia.org/T158350) [18:56:17] (03CR) 10Andrew Bogott: [C: 032] Nova: Turn off Verbose logging [puppet] - 10https://gerrit.wikimedia.org/r/338397 (https://phabricator.wikimedia.org/T158350) (owner: 10Andrew Bogott) [19:04:05] (03CR) 10Andrew Bogott: [C: 032] Tools: Update list of host aliases for mail relay [puppet] - 10https://gerrit.wikimedia.org/r/326308 (owner: 10Tim Landscheidt) [19:04:13] (03PS3) 10Andrew Bogott: Tools: Update list of host aliases for mail relay [puppet] - 10https://gerrit.wikimedia.org/r/326308 (owner: 10Tim Landscheidt) [19:05:41] 06Operations, 06Labs, 06Release-Engineering-Team, 13Patch-For-Review: contintcloud project thinks it is using 206 fixed-ip quota errantly - https://phabricator.wikimedia.org/T158350#3037145 (10Andrew) 05Open>03Resolved I cleaned up about 100 leaks, like this: update fixed_ips a, instances b set a.inst... [19:09:13] 06Operations, 10ops-eqiad, 06Discovery, 06Discovery-Search, and 2 others: rack/setup/install elastic1048-1052 - https://phabricator.wikimedia.org/T155790#3037149 (10ops-monitoring-bot) Script wmf_auto_reimage was launched by gehel on neodymium.eqiad.wmnet for hosts: ``` ['elastic1023.eqiad.wmnet'] ``` The... [19:20:25] !log upgrading maps-test2004 to nodejs6 for testing - T150354 [19:20:30] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:20:30] T150354: Implement Node6 support for Kartotherian/Tilerator - https://phabricator.wikimedia.org/T150354 [19:20:35] PROBLEM - puppet last run on mira is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [19:24:02] 06Operations: Harmonise "Directory Managers" group - https://phabricator.wikimedia.org/T157131#3037195 (10Andrew) Same here -- I use my own account but it won't kill me to look up the manager password instead. [19:24:23] (03PS2) 10Rush: labstore: nfs-mount-manager throw notice if symlink [puppet] - 10https://gerrit.wikimedia.org/r/338351 [19:27:39] PROBLEM - puppet last run on labnet1002 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [19:29:10] ^ andrewbogott from logging change? [19:29:17] or hm [19:29:31] hm, I don't know [19:29:35] I'll look if you aren't already [19:29:59] looking [19:30:39] (03CR) 10EBernhardson: Update elasticsearch module for es5 compatability (034 comments) [puppet] - 10https://gerrit.wikimedia.org/r/333969 (https://phabricator.wikimedia.org/T155578) (owner: 10EBernhardson) [19:30:39] RECOVERY - puppet last run on labnet1002 is OK: OK: Puppet is currently enabled, last run 13 seconds ago with 0 failures [19:30:54] (03CR) 10Rush: [C: 032] labstore: nfs-mount-manager throw notice if symlink [puppet] - 10https://gerrit.wikimedia.org/r/338351 (owner: 10Rush) [19:31:15] (03PS8) 10EBernhardson: Update elasticsearch module for es5 compatability [puppet] - 10https://gerrit.wikimedia.org/r/333969 (https://phabricator.wikimedia.org/T155578) [19:31:34] andrewbogott: seems fully transient and hit mira at same time too [19:31:40] ok [19:32:49] 06Operations, 10ops-eqiad, 06Discovery, 06Discovery-Search, and 2 others: rack/setup/install elastic1048-1052 - https://phabricator.wikimedia.org/T155790#3037224 (10ops-monitoring-bot) Completed auto-reimage of hosts: ``` ['elastic1023.eqiad.wmnet'] ``` and were **ALL** successful. [19:34:09] 06Operations, 10Ops-Access-Requests, 06Research-and-Data, 13Patch-For-Review: Cluster Access for Nithum Thain - https://phabricator.wikimedia.org/T157724#3037227 (10Nithum) Seems to be working. Thanks for all of the help everyone! [19:35:27] (03PS9) 10EBernhardson: Update elasticsearch module for es5 compatability [puppet] - 10https://gerrit.wikimedia.org/r/333969 (https://phabricator.wikimedia.org/T155578) [19:40:29] PROBLEM - Unmerged changes on repository puppet on puppetmaster1001 is CRITICAL: There is one unmerged change in puppet (dir /var/lib/git/operations/puppet, ref HEAD..origin/production). [19:42:37] ^me [19:43:29] RECOVERY - Unmerged changes on repository puppet on puppetmaster1001 is OK: No changes to merge. [19:44:54] (03CR) 10EBernhardson: "puppet compiler output: http://puppet-compiler.wmflabs.org/5504/" [puppet] - 10https://gerrit.wikimedia.org/r/333969 (https://phabricator.wikimedia.org/T155578) (owner: 10EBernhardson) [19:48:39] RECOVERY - puppet last run on mira is OK: OK: Puppet is currently enabled, last run 21 seconds ago with 0 failures [20:00:29] PROBLEM - puppet last run on db1039 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [20:14:43] spam warning, here it comes... [20:16:27] (03PS3) 10ArielGlenn: little tool that displays the last page id in bz2 xml content file [dumps/mwbzutils] - 10https://gerrit.wikimedia.org/r/337341 [20:16:55] (03PS2) 10ArielGlenn: tiny util to get last revision id from bz2 xml content dump file [dumps/mwbzutils] - 10https://gerrit.wikimedia.org/r/337863 [20:17:19] (03PS1) 10ArielGlenn: set bfile.marker to NULL in a bunch of places [dumps/mwbzutils] - 10https://gerrit.wikimedia.org/r/338402 [20:17:34] (03PS2) 10ArielGlenn: write results from getlastpageid and getlastrevid to stdout, not stderr [dumps/mwbzutils] - 10https://gerrit.wikimedia.org/r/338280 [20:17:58] (03PS2) 10ArielGlenn: update .gitignore with the binaries for the new utilities [dumps/mwbzutils] - 10https://gerrit.wikimedia.org/r/338281 [20:18:00] (03PS2) 10ArielGlenn: script to check whether page range of bz2 checkpoint file is correct [dumps/mwbzutils] - 10https://gerrit.wikimedia.org/r/338282 [20:18:02] (03PS1) 10ArielGlenn: combine the getlastpageid and getlastrevid utils into one [dumps/mwbzutils] - 10https://gerrit.wikimedia.org/r/338403 [20:18:04] (03PS1) 10ArielGlenn: remove getlastpageid and getlastrevid source files, fix up Makefile [dumps/mwbzutils] - 10https://gerrit.wikimedia.org/r/338404 [20:18:06] (03PS1) 10ArielGlenn: remove page_info_t now that we have id_info_t, and convert utils using it [dumps/mwbzutils] - 10https://gerrit.wikimedia.org/r/338405 [20:18:08] (03PS1) 10ArielGlenn: update README with docs on the new utility and script [dumps/mwbzutils] - 10https://gerrit.wikimedia.org/r/338406 [20:18:10] (03PS1) 10ArielGlenn: add function to dump parts of bz_info_t structure [dumps/mwbzutils] - 10https://gerrit.wikimedia.org/r/338407 [20:18:12] (03PS1) 10ArielGlenn: clean up some gcc warnings from unused variables and such [dumps/mwbzutils] - 10https://gerrit.wikimedia.org/r/338408 [20:18:14] (03PS1) 10ArielGlenn: bump version to 0.0.6 [dumps/mwbzutils] - 10https://gerrit.wikimedia.org/r/338409 [20:18:17] done [20:28:29] RECOVERY - puppet last run on db1039 is OK: OK: Puppet is currently enabled, last run 16 seconds ago with 0 failures [20:31:50] (03PS1) 10Faidon Liambotis: aptrepo: kill ref to cloudera from precise-wikimedia [puppet] - 10https://gerrit.wikimedia.org/r/338412 [20:32:08] (03CR) 10Faidon Liambotis: [V: 032 C: 032] aptrepo: kill ref to cloudera from precise-wikimedia [puppet] - 10https://gerrit.wikimedia.org/r/338412 (owner: 10Faidon Liambotis) [20:38:49] (03PS1) 10Yuvipanda: docker: Pin upstream version import between 1.12 and 1.13 [puppet] - 10https://gerrit.wikimedia.org/r/338414 [20:39:04] paravoid: ^ [20:39:56] (03PS2) 10Faidon Liambotis: docker: Pin upstream version import between 1.12 and 1.13 [puppet] - 10https://gerrit.wikimedia.org/r/338414 (owner: 10Yuvipanda) [20:40:01] (03CR) 10Faidon Liambotis: [C: 032] docker: Pin upstream version import between 1.12 and 1.13 [puppet] - 10https://gerrit.wikimedia.org/r/338414 (owner: 10Yuvipanda) [20:43:39] PROBLEM - Redis replication status tcp_6479 on rdb2006 is CRITICAL: CRITICAL ERROR - Redis Library - can not ping 10.192.48.44 on port 6479 [20:44:39] RECOVERY - Redis replication status tcp_6479 on rdb2006 is OK: OK: REDIS 2.8.17 on 10.192.48.44:6479 has 1 databases (db0) with 3697732 keys, up 109 days 12 hours - replication_delay is 0 [20:51:49] PROBLEM - wikidata.org dispatch lag is higher than 300s on wikidata is CRITICAL: HTTP CRITICAL: HTTP/1.1 200 OK - pattern not found - 1894 bytes in 0.106 second response time [20:53:03] (03PS1) 10Yuvipanda: k8s: Upgrade docker + turn on live-restore for prod [puppet] - 10https://gerrit.wikimedia.org/r/338416 (https://phabricator.wikimedia.org/T157180) [20:56:37] (03PS2) 10Yuvipanda: k8s: Upgrade docker + turn on live-restore for prod [puppet] - 10https://gerrit.wikimedia.org/r/338416 (https://phabricator.wikimedia.org/T157180) [20:56:44] (03CR) 10Yuvipanda: [V: 032 C: 032] k8s: Upgrade docker + turn on live-restore for prod [puppet] - 10https://gerrit.wikimedia.org/r/338416 (https://phabricator.wikimedia.org/T157180) (owner: 10Yuvipanda) [20:56:49] RECOVERY - wikidata.org dispatch lag is higher than 300s on wikidata is OK: HTTP OK: HTTP/1.1 200 OK - 1890 bytes in 0.090 second response time [21:11:51] 06Operations, 10CirrusSearch, 06Discovery, 10Elasticsearch, and 3 others: Puppet changes required for elasticsearch 5.x upgrade - https://phabricator.wikimedia.org/T155578#3037355 (10EBernhardson) Last time we upgraded (1.7->2.x) we had some annoying issues with the .deb package versions. We were only able... [21:12:45] whats going on with gerrit out of curiosity (i noticed the channel topic) [21:12:52] jouncebot now [21:14:02] someone should probably check that out.. [21:18:19] 06Operations, 10Tool-Labs-tools-Other: Jouncebot: Crashes when issued a command. - https://phabricator.wikimedia.org/T158448#3037357 (10Zppix) [21:21:15] (03PS8) 10Nuria: Changes to perf consumer of event logging events [puppet] - 10https://gerrit.wikimedia.org/r/337158 (https://phabricator.wikimedia.org/T156760) [21:21:25] (03PS3) 10Nuria: navtiming: Make tests easier to extend [puppet] - 10https://gerrit.wikimedia.org/r/338044 (owner: 10Krinkle) [21:21:53] 06Operations, 10Stashbot, 10Tool-Labs-tools-Other: Jouncebot: Crashes when issued a command. - https://phabricator.wikimedia.org/T158448#3037387 (10Paladox) [21:22:01] jouncebot now [21:23:30] 06Operations, 10Stashbot, 10Tool-Labs-tools-Other: Jouncebot: Crashes when issued a command. - https://phabricator.wikimedia.org/T158448#3037357 (10bd808) ``` ERROR:root:Unhandled exception. Terminating. Traceback (most recent call last): File "./jouncebot/jouncebot.py", line 281, in bot.start... [21:24:57] 06Operations, 10Tool-Labs-tools-Other: Jouncebot: Crashes when issued a command. - https://phabricator.wikimedia.org/T158448#3037394 (10Paladox) [21:25:48] jouncebot: refresh [21:25:51] I refreshed my knowledge about deployments. [21:25:56] jouncebot now [21:28:18] (03PS1) 10Yuvipanda: tools: Allow setting up k8s master without tools star cert [puppet] - 10https://gerrit.wikimedia.org/r/338429 [21:28:37] chasemp: ^ do you have a way to cherrypick things for your test or want me to test and merge this rn? [21:29:08] yuvipanda: I don't easily, if we can merge that would be cleanest [21:29:27] chasemp: ok, let me cherrypick on tools to test and then I'll merge [21:29:31] sure [21:29:45] yuvipanda: if you want to hold till mon/tue that's cool [21:30:15] chasemp: no no let's do it [21:30:16] should be a noop [21:30:18] kk [21:30:21] I'm down [21:30:25] chasemp: I want ot get this out of the way asap :D [21:30:30] (03PS2) 10Zppix: tools: Allow setting up k8s master without tools star cert [puppet] - 10https://gerrit.wikimedia.org/r/338429 (owner: 10Yuvipanda) [21:30:31] yup [21:34:15] !log gehel@puppetmaster1001 conftool action : set/pooled=yes; selector: name=elastic1023.eqiad.wmnet [21:34:19] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [21:34:24] (03PS3) 10Yuvipanda: tools: Allow setting up k8s master without tools star cert [puppet] - 10https://gerrit.wikimedia.org/r/338429 [21:34:41] (03PS1) 10BryanDavis: Guard against empty upcoming list [wikimedia/bots/jouncebot] - 10https://gerrit.wikimedia.org/r/338440 (https://phabricator.wikimedia.org/T158448) [21:36:05] (03PS4) 10Yuvipanda: tools: Allow setting up k8s master without tools star cert [puppet] - 10https://gerrit.wikimedia.org/r/338429 (https://phabricator.wikimedia.org/T158452) [21:37:43] (03CR) 10Yuvipanda: [V: 032 C: 032] tools: Allow setting up k8s master without tools star cert [puppet] - 10https://gerrit.wikimedia.org/r/338429 (https://phabricator.wikimedia.org/T158452) (owner: 10Yuvipanda) [21:37:59] chasemp: merged [21:38:13] yuvipanda: kk [21:45:30] (03CR) 10jerkins-bot: [V: 04-1] Guard against empty upcoming list [wikimedia/bots/jouncebot] - 10https://gerrit.wikimedia.org/r/338440 (https://phabricator.wikimedia.org/T158448) (owner: 10BryanDavis) [21:45:56] 06Operations, 10ops-ulsfo, 10netops: lvs4002 power supply failure - https://phabricator.wikimedia.org/T151273#3037470 (10RobH) a:05RobH>03BBlack I'm assigning this task to Brandon for followup. In IRC, we discussed that he would likely fail ulsfo over to a 3 lvs system setup. I'm not sure if there is a... [21:46:29] (03PS2) 10BryanDavis: Guard against empty upcoming list [wikimedia/bots/jouncebot] - 10https://gerrit.wikimedia.org/r/338440 (https://phabricator.wikimedia.org/T158448) [21:46:39] PROBLEM - Disk space on elastic1022 is CRITICAL: DISK CRITICAL - free space: /srv 61440 MB (12% inode=99%) [21:47:15] gehel: ^ [21:47:17] You about? [21:47:42] I'm guessing i can ignore all of those since you seem to be working with them today? [21:49:57] robh: yeah, thanks! Lots of movement on the cluster, and shards not moving off fast enough, but mostly expected... [21:50:20] Cool, I just didn't want to assume without confirmation, seemed like asking for trouble =] [21:52:15] (03PS2) 10Yuvipanda: toolserver_legacy: Add redirect for ~wiegels/wikipedia-termine.php [puppet] - 10https://gerrit.wikimedia.org/r/336764 (https://phabricator.wikimedia.org/T62888) (owner: 10Tim Landscheidt) [21:52:27] (03CR) 10Yuvipanda: [V: 032 C: 032] toolserver_legacy: Add redirect for ~wiegels/wikipedia-termine.php [puppet] - 10https://gerrit.wikimedia.org/r/336764 (https://phabricator.wikimedia.org/T62888) (owner: 10Tim Landscheidt) [21:52:37] 06Operations: re-create install2001 as a VM - https://phabricator.wikimedia.org/T156440#3037480 (10RobH) [21:52:39] 06Operations, 10ops-codfw, 06DC-Ops, 10hardware-requests: decom install2001 - https://phabricator.wikimedia.org/T157840#3037478 (10RobH) 05Open>03Resolved Ok, resolving this task, as I just removed install2001 from the description of the disabled port asw-a-codfw:ge-5/0/11. I also removed it from the... [21:52:46] 06Operations, 10ops-codfw, 06DC-Ops, 10hardware-requests: decom install2001 - https://phabricator.wikimedia.org/T157840#3037481 (10RobH) [21:56:05] (03PS4) 10Yuvipanda: Revert "tools: store verbose logrotate logs" [puppet] - 10https://gerrit.wikimedia.org/r/329217 (https://phabricator.wikimedia.org/T96007) (owner: 10Tim Landscheidt) [21:56:44] (03CR) 10Yuvipanda: [V: 032 C: 032] "@scfc thanks for the patch! I'll run clush right after merging this." [puppet] - 10https://gerrit.wikimedia.org/r/329217 (https://phabricator.wikimedia.org/T96007) (owner: 10Tim Landscheidt) [21:58:51] (03PS1) 10EBernhardson: Remove non-existent setting from apifeatureusage logstash template [puppet] - 10https://gerrit.wikimedia.org/r/338469 [22:02:33] (03PS1) 10Yuvipanda: k8s: Fix puppet resource conflict that is absolutely stupid [puppet] - 10https://gerrit.wikimedia.org/r/338470 [22:02:42] chasemp: ^ [22:03:01] chasemp: it's puppet being stupid [22:03:17] (03PS2) 10Yuvipanda: k8s: Fix puppet resource conflict that is absolutely stupid [puppet] - 10https://gerrit.wikimedia.org/r/338470 [22:03:27] (03CR) 10Yuvipanda: [V: 032 C: 032] k8s: Fix puppet resource conflict that is absolutely stupid [puppet] - 10https://gerrit.wikimedia.org/r/338470 (owner: 10Yuvipanda) [22:04:01] ah hm [22:04:19] PROBLEM - puppet last run on cerium is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [22:04:21] chasemp: merged, try in a min [22:04:39] kk [22:13:31] (03PS1) 10Yuvipanda: k8s: Attempt to fix puppet circular dependency cycle [puppet] - 10https://gerrit.wikimedia.org/r/338471 [22:13:44] (03PS2) 10Yuvipanda: k8s: Attempt to fix puppet circular dependency cycle [puppet] - 10https://gerrit.wikimedia.org/r/338471 [22:13:51] chasemp: ^ might fix [22:13:56] I misread the original error [22:14:26] should we roll back the first version? [22:15:03] chasemp: yeah, I did as part of this [22:15:07] (03CR) 10Yuvipanda: [V: 032 C: 032] k8s: Attempt to fix puppet circular dependency cycle [puppet] - 10https://gerrit.wikimedia.org/r/338471 (owner: 10Yuvipanda) [22:15:16] well, let me read things before I speak then :) [22:25:39] PROBLEM - Disk space on elastic1022 is CRITICAL: DISK CRITICAL - free space: /srv 60498 MB (12% inode=99%) [22:32:19] RECOVERY - puppet last run on cerium is OK: OK: Puppet is currently enabled, last run 11 seconds ago with 0 failures [22:41:39] RECOVERY - Disk space on elastic1022 is OK: DISK OK [22:57:09] PROBLEM - puppet last run on mw1170 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [23:18:42] (03PS1) 10Smalyshev: Bump timeout to 1 minute [puppet] - 10https://gerrit.wikimedia.org/r/338473 (https://phabricator.wikimedia.org/T158184) [23:25:39] PROBLEM - puppet last run on iridium is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [23:26:09] RECOVERY - puppet last run on mw1170 is OK: OK: Puppet is currently enabled, last run 38 seconds ago with 0 failures [23:26:24] twentyafterfour ^^ [23:27:35] paladox: it's been doing that periodically, and it's not anything specific to iridium so nothing you need to be worried about [23:27:44] (or nothing I need to be worried about, really :D) [23:27:46] ok [23:39:16] (03CR) 10BryanDavis: [C: 031] "In theory the index templates are applied by Logstash via the `template` parameter of the `elasticsearch` output plugin. In practice usual" [puppet] - 10https://gerrit.wikimedia.org/r/338469 (owner: 10EBernhardson) [23:50:49] (03CR) 10Paladox: [C: 031] Guard against empty upcoming list [wikimedia/bots/jouncebot] - 10https://gerrit.wikimedia.org/r/338440 (https://phabricator.wikimedia.org/T158448) (owner: 10BryanDavis) [23:53:39] RECOVERY - puppet last run on iridium is OK: OK: Puppet is currently enabled, last run 49 seconds ago with 0 failures