[00:00:21] godog: i think it's all ok for now, thanks again for offering [00:20:28] mutante: no worries -- thanks for taking care of that! [00:26:52] (03CR) 10Krinkle: [] tests: Use cp0 and srv0 instead of cp1/mw1 for sample data (031 comment) [software/conftool] - 10https://gerrit.wikimedia.org/r/327686 (owner: 10Krinkle) [00:30:36] (03PS3) 10Krinkle: tests: Use sample data that doesn't match production names [software/conftool] - 10https://gerrit.wikimedia.org/r/327686 [00:31:17] (03CR) 10jenkins-bot: [V: 04-1] tests: Use sample data that doesn't match production names [software/conftool] - 10https://gerrit.wikimedia.org/r/327686 (owner: 10Krinkle) [00:33:23] PROBLEM - puppet last run on analytics1054 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [00:50:40] 06Operations, 06Labs, 10Tool-Labs: puppetize legacy toolserver mail aliases - https://phabricator.wikimedia.org/T153510#2882918 (10Dzahn) [00:50:49] Attention: Scheduled maintaince for grrrit-wm in 9 minutes !!! [00:50:59] 06Operations, 06Labs, 10Tool-Labs: puppetize legacy toolserver mail aliases - https://phabricator.wikimedia.org/T153510#2882933 (10Dzahn) [01:02:23] RECOVERY - puppet last run on analytics1054 is OK: OK: Puppet is currently enabled, last run 49 seconds ago with 0 failures [01:03:44] PROBLEM - puppet last run on rcs1002 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [01:03:44] Attention: Beer'o'clock was 3 minutes ago. cu later [01:11:28] mutante: quick question how do run a dockerfile on kubectl [01:17:40] Zppix you doint [01:17:54] How do i [01:18:01] Rerun the kubectl job [01:18:05] You doint, you have to be an admin [01:18:31] Shit... [01:19:09] Plan b node cmd [01:19:20] that wont work [01:19:37] Then wtf do i do [01:20:07] you will need to ask -labs for help with that as i have never stopped the bot from running permently [01:20:17] Ill run node temp [01:20:22] Is that ok [01:21:01] no, i doint think it will work [01:31:43] RECOVERY - puppet last run on rcs1002 is OK: OK: Puppet is currently enabled, last run 5 seconds ago with 0 failures [01:37:26] doint [01:40:54] PROBLEM - puppet last run on elastic1024 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [01:44:42] 06Operations, 10scap, 03Scap3: Trying to scap while l10nupdate is syncing shows unhelpful error - https://phabricator.wikimedia.org/T153278#2882972 (10thcipriani) p:05Triage>03Low Most likely the easiest way to take care of this would be to add a message in `l10nupdate-1` (https://github.com/wikimedia/op... [01:48:03] PROBLEM - Postgres Replication Lag on maps2002 is CRITICAL: CRITICAL - Rep Delay is: 1805.255874 Seconds [01:49:03] RECOVERY - Postgres Replication Lag on maps2002 is OK: OK - Rep Delay is: 24.353207 Seconds [01:51:18] 06Operations, 10ops-eqiad, 06Discovery, 10Wikidata, 10Wikidata-Query-Service: rack/setup/install wdqs1003 - https://phabricator.wikimedia.org/T152643#2883006 (10Smalyshev) [01:58:03] PROBLEM - puppet last run on cp3048 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [02:08:53] RECOVERY - puppet last run on elastic1024 is OK: OK: Puppet is currently enabled, last run 44 seconds ago with 0 failures [02:17:34] PROBLEM - puppet last run on elastic1041 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [02:26:03] RECOVERY - puppet last run on cp3048 is OK: OK: Puppet is currently enabled, last run 7 seconds ago with 0 failures [02:29:51] (03CR) 10Dzahn: [C: 031] Add jdlrobson to the deploy-service group [puppet] - 10https://gerrit.wikimedia.org/r/327755 (https://phabricator.wikimedia.org/T153458) (owner: 10Mobrovac) [02:30:30] (03CR) 10Dzahn: [C: 031] Trending Edits: Add the admin group (and add it to SCB) [puppet] - 10https://gerrit.wikimedia.org/r/327754 (https://phabricator.wikimedia.org/T153458) (owner: 10Mobrovac) [02:32:51] !log l10nupdate@tin scap sync-l10n completed (1.29.0-wmf.6) (duration: 13m 14s) [02:33:03] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [02:34:30] PROBLEM - puppet last run on sca1004 is CRITICAL: CRITICAL: Puppet has 32 failures. Last run 2 minutes ago with 32 failures. Failed resources (up to 3 shown): Package[sysstat],Package[lldpd],Package[ncdu],Package[dstat] [02:37:21] !log l10nupdate@tin ResourceLoader cache refresh completed at Sat Dec 17 02:37:21 UTC 2016 (duration 4m 30s) [02:37:32] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [02:46:33] RECOVERY - puppet last run on elastic1041 is OK: OK: Puppet is currently enabled, last run 49 seconds ago with 0 failures [03:01:00] (03CR) 10Yuvipanda: [] [WIP] maintain-dbusers.py for maintaining labsdb users (036 comments) [puppet] - 10https://gerrit.wikimedia.org/r/327157 (owner: 10Yuvipanda) [03:02:23] RECOVERY - puppet last run on sca1004 is OK: OK: Puppet is currently enabled, last run 25 seconds ago with 0 failures [03:24:07] 06Operations, 06Labs, 10Tool-Labs: puppetize legacy toolserver mail aliases - https://phabricator.wikimedia.org/T153510#2883100 (10scfc) [03:33:04] (also msg'd -labs) is there anything going on with the network right now? [03:33:17] can't seem to git clone gerrit, cloning GitHub is happening at ~3KiB/s [03:33:32] to clarify, I'm on the tool labs console. [03:36:33] PROBLEM - puppet last run on prometheus1001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [04:05:33] RECOVERY - puppet last run on prometheus1001 is OK: OK: Puppet is currently enabled, last run 45 seconds ago with 0 failures [04:14:13] PROBLEM - mailman I/O stats on fermium is CRITICAL: CRITICAL - I/O stats: Transfers/Sec=536.10 Read Requests/Sec=1452.50 Write Requests/Sec=165.10 KBytes Read/Sec=19082.00 KBytes_Written/Sec=2405.20 [04:14:43] PROBLEM - puppet last run on labsdb1011 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [04:15:13] RECOVERY - mailman I/O stats on fermium is OK: OK - I/O stats: Transfers/Sec=214.50 Read Requests/Sec=59.60 Write Requests/Sec=0.50 KBytes Read/Sec=7542.00 KBytes_Written/Sec=9.60 [04:18:13] PROBLEM - mailman I/O stats on fermium is CRITICAL: CRITICAL - I/O stats: Transfers/Sec=2779.70 Read Requests/Sec=3210.31 Write Requests/Sec=9.11 KBytes Read/Sec=15610.41 KBytes_Written/Sec=2994.19 [04:27:13] RECOVERY - mailman I/O stats on fermium is OK: OK - I/O stats: Transfers/Sec=8.00 Read Requests/Sec=0.00 Write Requests/Sec=0.20 KBytes Read/Sec=0.00 KBytes_Written/Sec=4.40 [04:42:43] RECOVERY - puppet last run on labsdb1011 is OK: OK: Puppet is currently enabled, last run 16 seconds ago with 0 failures [04:51:03] PROBLEM - puppet last run on analytics1052 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [05:09:53] PROBLEM - puppet last run on sca2004 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [05:11:23] PROBLEM - puppet last run on ms-be1012 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [05:20:03] RECOVERY - puppet last run on analytics1052 is OK: OK: Puppet is currently enabled, last run 8 seconds ago with 0 failures [05:35:42] 06Operations, 06Commons, 10TimedMediaHandler-Transcode, 10Wikimedia-Video: Commons video transcoders have over 6500 tasks in the backlog. - https://phabricator.wikimedia.org/T153488#2883263 (10zhuyifei1999) I think this needs attention from #operations [05:37:53] RECOVERY - puppet last run on sca2004 is OK: OK: Puppet is currently enabled, last run 34 seconds ago with 0 failures [05:39:35] RECOVERY - puppet last run on ms-be1012 is OK: OK: Puppet is currently enabled, last run 22 seconds ago with 0 failures [05:39:47] 06Operations, 10Wikimedia-General-or-Unknown, 10hardware-requests: Extend capacity for video scalers - https://phabricator.wikimedia.org/T150067#2883269 (10zhuyifei1999) [05:39:49] 06Operations, 10TimedMediaHandler, 10hardware-requests: Assign 3 more servers to video scaler duty - https://phabricator.wikimedia.org/T114337#2883270 (10zhuyifei1999) [05:39:52] 06Operations, 06Commons, 10TimedMediaHandler-Transcode, 10Wikimedia-Video: Commons video transcoders have over 6500 tasks in the backlog. - https://phabricator.wikimedia.org/T153488#2883268 (10zhuyifei1999) [05:58:43] PROBLEM - puppet last run on pc1006 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [06:06:13] PROBLEM - Host mr1-ulsfo.oob is DOWN: PING CRITICAL - Packet loss = 100% [06:11:23] RECOVERY - Host mr1-ulsfo.oob is UP: PING OK - Packet loss = 0%, RTA = 72.95 ms [06:26:43] RECOVERY - puppet last run on pc1006 is OK: OK: Puppet is currently enabled, last run 43 seconds ago with 0 failures [06:42:13] PROBLEM - puppet last run on cp4007 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [07:06:23] 06Operations, 06Commons, 10TimedMediaHandler-Transcode, 10Wikimedia-Video: Commons video transcoders have over 6500 tasks in the backlog. - https://phabricator.wikimedia.org/T153488#2882187 (10Pokefan95) I am going to mark all server-side upload Phabricator tasks which are not yet performed as stalled. [07:10:13] RECOVERY - puppet last run on cp4007 is OK: OK: Puppet is currently enabled, last run 6 seconds ago with 0 failures [07:11:37] (03PS1) 10Smalyshev: Add new units for the following: [mediawiki-config] - 10https://gerrit.wikimedia.org/r/327907 (https://phabricator.wikimedia.org/T150881) [07:12:03] 06Operations, 10MediaWiki-Internationalization: Norwegian messages inContentLanguage look for on-wiki overrides at the /nb subpage, not the root page - https://phabricator.wikimedia.org/T126146#2883330 (10Krinkle) >>! In T126146#2880170, @thiemowmde wrote: > "Sites" is not a first-class Wikibase concept. It wa... [07:12:07] (03Restored) 10Krinkle: Set valid content language for Norwegian wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/277519 (https://phabricator.wikimedia.org/T126146) (owner: 10Nikerabbit) [07:12:17] (03PS2) 10Krinkle: Set valid content language for Norwegian wikis [mediawiki-config] - 10https://gerrit.wikimedia.org/r/277519 (https://phabricator.wikimedia.org/T126146) (owner: 10Nikerabbit) [07:13:14] (03CR) 10Krinkle: [C: 04-1] "Pending reply at https://phabricator.wikimedia.org/T126146#2883330" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/277519 (https://phabricator.wikimedia.org/T126146) (owner: 10Nikerabbit) [07:13:36] (03PS2) 10Smalyshev: Add new units for the following: [mediawiki-config] - 10https://gerrit.wikimedia.org/r/327907 (https://phabricator.wikimedia.org/T150881) [07:32:03] PROBLEM - puppet last run on eeden is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [07:58:33] PROBLEM - High load average on labstore1003 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [24.0] [08:01:03] RECOVERY - puppet last run on eeden is OK: OK: Puppet is currently enabled, last run 52 seconds ago with 0 failures [08:01:33] RECOVERY - High load average on labstore1003 is OK: OK: Less than 50.00% above the threshold [16.0] [08:04:33] PROBLEM - High load average on labstore1003 is CRITICAL: CRITICAL: 55.56% of data above the critical threshold [24.0] [08:05:34] RECOVERY - High load average on labstore1003 is OK: OK: Less than 50.00% above the threshold [16.0] [08:07:43] PROBLEM - puppet last run on radium is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [08:21:13] PROBLEM - puppet last run on elastic1035 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [08:29:29] (03CR) 10TTO: [] Set valid content language for Norwegian wikis (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/277519 (https://phabricator.wikimedia.org/T126146) (owner: 10Nikerabbit) [08:31:23] 06Operations, 10MediaWiki-Internationalization: Norwegian messages inContentLanguage look for on-wiki overrides at the /nb subpage, not the root page - https://phabricator.wikimedia.org/T126146#2883359 (10Nemo_bis) [08:34:44] RECOVERY - puppet last run on radium is OK: OK: Puppet is currently enabled, last run 11 seconds ago with 0 failures [08:49:13] RECOVERY - puppet last run on elastic1035 is OK: OK: Puppet is currently enabled, last run 25 seconds ago with 0 failures [08:50:22] Hi all, should translations for T153465 be in translatewiki.net or in WikimediaMessages? Or both of it will work? [08:50:23] T153465: Add new page protection level on et.wikipedia.org - https://phabricator.wikimedia.org/T153465 [09:23:03] RECOVERY - HHVM jobrunner on mw1168 is OK: HTTP OK: HTTP/1.1 200 OK - 202 bytes in 0.002 second response time [09:24:08] !log restarted stuck hhvm on mw1168 (forgot to run hhvm-dump-debug) [09:24:22] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:27:13] PROBLEM - Redis status tcp_6479 on rdb2006 is CRITICAL: CRITICAL ERROR - Redis Library - can not ping 10.192.48.44 on port 6479 [09:28:04] RECOVERY - Redis status tcp_6479 on rdb2006 is OK: OK: REDIS 2.8.17 on 10.192.48.44:6479 has 1 databases (db0) with 4383067 keys, up 47 days 1 hours - replication_delay is 38 [09:36:49] I I am having problems with session loss, it pops in and out. Page will show fine in dipaly mode, though rollback is failing repeatedly [09:38:13] usual methods of resolving such an issue are not functioning for me [09:38:48] !log ran apt-get clean and removed some /tmp files on stat1002 to free some space [09:39:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:39:53] PROBLEM - puppet last run on mw1162 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:40:03] PROBLEM - puppet last run on sca2004 is CRITICAL: CRITICAL: Puppet has 27 failures. Last run 2 minutes ago with 27 failures. Failed resources (up to 3 shown): Exec[eth0_v6_token],Package[wipe],Package[zotero/translators],Package[zotero/translation-server] [10:06:13] PROBLEM - Redis status tcp_6479 on rdb2006 is CRITICAL: CRITICAL ERROR - Redis Library - can not ping 10.192.48.44 on port 6479 [10:07:03] RECOVERY - Redis status tcp_6479 on rdb2006 is OK: OK: REDIS 2.8.17 on 10.192.48.44:6479 has 1 databases (db0) with 4381867 keys, up 47 days 1 hours - replication_delay is 0 [10:07:53] RECOVERY - puppet last run on mw1162 is OK: OK: Puppet is currently enabled, last run 48 seconds ago with 0 failures [10:08:03] RECOVERY - puppet last run on sca2004 is OK: OK: Puppet is currently enabled, last run 32 seconds ago with 0 failures [10:39:10] (03CR) 10Lydia Pintscher: [C: 031] Add new units for the following: [mediawiki-config] - 10https://gerrit.wikimedia.org/r/327907 (https://phabricator.wikimedia.org/T150881) (owner: 10Smalyshev) [10:58:20] Hi all, https://commons.wikimedia.org/wiki/File:Metro_Mad_Linea_7.png half-disappeared. Can somebody fix it? [11:16:17] 06Operations, 06Commons, 10media-storage, 05MW-1.27-release-notes: Some files had disappeared from Commons after renaming - https://phabricator.wikimedia.org/T111838#2883564 (10aaron) Is this still valid? [11:17:07] ^^ re Urbanecm “File not found: /v1/AUTH_mw/wikipedia-commons-local-public.cf/c/cf/Metro_Mad_Linea_7.png” [11:17:34] Urbanecm: You’ll probably need to file a ticket on Phabricator. [11:17:49] Revent, this can't be fixed easily? [11:18:03] Revent, can you help me what projects should I include? [11:18:43] Urbanecm: I actually think there is a tracking ticket for those, but I dunno what it is. [11:19:09] Revent, I've found T111838 [11:19:10] T111838: Some files had disappeared from Commons after renaming - https://phabricator.wikimedia.org/T111838 [11:19:12] Sometimes they can fix them, sometimes they can’t. [11:19:30] And when they can't? The file will be "deleted"? [11:20:19] Urbanecm: If they can’t find it, yeah, sometimes no real choice. [11:20:40] (and it sucks) [11:20:58] Revent, I've probably found the tracking task. Do you mean T108517? [11:20:58] T108517: PNG thumbnails issues (tracking) - https://phabricator.wikimedia.org/T108517 [11:21:18] No, that’s about rendering of thumbs. [11:23:05] Ok [11:23:58] I've filled T153540 for it. [11:23:59] T153540: Metro Mad Linea 7.png file half-disappeared - it can't be used - https://phabricator.wikimedia.org/T153540 [11:24:25] It’s ‘shown’ in Google and TinEye searches, but I can’t get it to show me the full-size… looking at wayback now (showly) [11:24:29] (slowly) [11:26:31] It’s old enough it might exist in a backup copy. [11:27:27] 06Operations, 06Commons, 06Multimedia, 10media-storage, 15User-Urbanecm: Metro Mad Linea 7.png file half-disappeared - it can't be used - https://phabricator.wikimedia.org/T153540#2883589 (10Urbanecm) [12:17:43] 06Operations, 06Commons, 06Multimedia, 10media-storage, 15User-Urbanecm: Metro Mad Linea 7.png file half-disappeared - it can't be used - https://phabricator.wikimedia.org/T153540#2883589 (10Pokefan95) Accessing the original file returns a 404 error. [12:20:04] PROBLEM - check_mysql on fdb2001 is CRITICAL: SLOW_SLAVE CRITICAL: Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 1696 [12:25:04] RECOVERY - check_mysql on fdb2001 is OK: Uptime: 312867 Threads: 1 Questions: 99638169 Slow queries: 1904 Opens: 3652 Flush tables: 2 Open tables: 542 Queries per second avg: 318.468 Slave IO: Yes Slave SQL: Yes Seconds Behind Master: 0 [12:29:24] 06Operations, 06Commons, 06Multimedia, 10media-storage, 15User-Urbanecm: Metro Mad Linea 7.png file half-disappeared - it can't be used - https://phabricator.wikimedia.org/T153540#2883683 (10Urbanecm) I know. I meant why it return 404. [12:39:13] PROBLEM - puppet last run on cp3022 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [13:03:33] PROBLEM - puppet last run on sca1004 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [13:08:13] RECOVERY - puppet last run on cp3022 is OK: OK: Puppet is currently enabled, last run 57 seconds ago with 0 failures [13:27:04] PROBLEM - puppet last run on terbium is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [13:29:21] 06Operations, 06Commons, 10media-storage, 05MW-1.27-release-notes: Some files had disappeared from Commons after renaming - https://phabricator.wikimedia.org/T111838#2883757 (10matmarex) [13:31:17] 06Operations, 06Commons, 10media-storage, 05MW-1.27-release-notes: Some files had disappeared from Commons after renaming - https://phabricator.wikimedia.org/T111838#1617529 (10matmarex) I'm not sure what you mean by "still valid". The files mentioned in the task description were almost all re-uploaded (ex... [13:32:33] RECOVERY - puppet last run on sca1004 is OK: OK: Puppet is currently enabled, last run 23 seconds ago with 0 failures [13:35:54] 06Operations, 06Commons, 10media-storage, 05MW-1.27-release-notes: Some files had disappeared from Commons after renaming - https://phabricator.wikimedia.org/T111838#2883763 (10Ankry) I think yes, as thumbnails of initial file revisions are still not available. But the priority can be lowered IMO, as all f... [13:56:03] RECOVERY - puppet last run on terbium is OK: OK: Puppet is currently enabled, last run 41 seconds ago with 0 failures [14:00:26] 06Operations, 06Commons, 06Multimedia, 10media-storage, 15User-Urbanecm: Metro Mad Linea 7.png file half-disappeared - it can't be used - https://phabricator.wikimedia.org/T153540#2883818 (10matmarex) Well, it looks gone. The file was uploaded in 2012, apparently still existed around a year ago: http://w... [14:08:43] PROBLEM - puppet last run on dbproxy1007 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [14:33:14] 06Operations, 06Commons, 10media-storage, 05MW-1.27-release-notes: Some files had disappeared from Commons after renaming - https://phabricator.wikimedia.org/T111838#2883857 (10aaron) By "valid" I mean "new occurrences" (though at least the one of them was earlier in the year). Of course, any missing files... [14:37:43] RECOVERY - puppet last run on dbproxy1007 is OK: OK: Puppet is currently enabled, last run 14 seconds ago with 0 failures [14:48:33] PROBLEM - puppet last run on labsdb1005 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:16:33] RECOVERY - puppet last run on labsdb1005 is OK: OK: Puppet is currently enabled, last run 30 seconds ago with 0 failures [16:14:11] 06Operations, 06Commons, 10media-storage, 05MW-1.27-release-notes: Some files had disappeared from Commons after renaming - https://phabricator.wikimedia.org/T111838#2883942 (10matmarex) T153540 is an identical issue that was just filed today. [16:29:53] 07Puppet, 10Beta-Cluster-Infrastructure, 13Patch-For-Review: puppet failure on deployment-phab01 ... is not a Hash. It looks to be a Array at /etc/puppet/modules/phabricator/manifests/init.pp:68 - https://phabricator.wikimedia.org/T147818#2883944 (10Krenair) It won't install mariadb-client-10.0 because that... [16:37:28] 06Operations, 06Commons, 10TimedMediaHandler-Transcode, 10Wikimedia-Video: Commons video transcoders have over 6500 tasks in the backlog. - https://phabricator.wikimedia.org/T153488#2883948 (10Aklapper) p:05Triage>03High [17:02:04] PROBLEM - puppet last run on sca2003 is CRITICAL: CRITICAL: Puppet has 27 failures. Last run 2 minutes ago with 27 failures. Failed resources (up to 3 shown): Exec[eth0_v6_token],Package[wipe],Package[zotero/translators],Package[zotero/translation-server] [17:21:03] PROBLEM - puppet last run on analytics1056 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:29:13] RECOVERY - puppet last run on sca2003 is OK: OK: Puppet is currently enabled, last run 22 seconds ago with 0 failures [17:49:03] RECOVERY - puppet last run on analytics1056 is OK: OK: Puppet is currently enabled, last run 0 seconds ago with 0 failures [17:53:03] PROBLEM - puppet last run on db1055 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [18:16:24] (03CR) 1020after4: [C: 031] phabricator: delete labs role [puppet] - 10https://gerrit.wikimedia.org/r/327690 (https://phabricator.wikimedia.org/T139475) (owner: 10Dzahn) [18:20:03] RECOVERY - puppet last run on db1055 is OK: OK: Puppet is currently enabled, last run 1 second ago with 0 failures [18:21:39] (03CR) 10Paladox: [C: 031] "Wont the deployment phab-01 and phab-02 need migration to the main puppet class?" [puppet] - 10https://gerrit.wikimedia.org/r/327690 (https://phabricator.wikimedia.org/T139475) (owner: 10Dzahn) [19:09:08] 07Puppet, 10Beta-Cluster-Infrastructure, 13Patch-For-Review: puppet failure on deployment-phab01 ... is not a Hash. It looks to be a Array at /etc/puppet/modules/phabricator/manifests/init.pp:68 - https://phabricator.wikimedia.org/T147818#2884087 (10Paladox) You could apt-get --purge remove mysql-client-cor... [19:10:48] 07Puppet, 10Beta-Cluster-Infrastructure, 13Patch-For-Review: puppet failure on deployment-phab01 ... is not a Hash. It looks to be a Array at /etc/puppet/modules/phabricator/manifests/init.pp:68 - https://phabricator.wikimedia.org/T147818#2884092 (10Paladox) Anyways labs phabricator class is being removed i... [21:02:18] 06Operations, 06Commons, 10TimedMediaHandler-Transcode, 10Wikimedia-Video: Commons video transcoders have over 6500 tasks in the backlog. - https://phabricator.wikimedia.org/T153488#2884159 (10Dereckson) >>! In T153488#2883297, @Pokefan95 wrote: > I am going to mark all server-side upload Phabricator tasks... [21:24:03] PROBLEM - puppet last run on sca1003 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [21:35:23] PROBLEM - puppet last run on baham is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [21:36:23] RECOVERY - puppet last run on baham is OK: OK: Puppet is currently enabled, last run 18 minutes ago with 0 failures [21:39:23] PROBLEM - puppet last run on baham is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [21:40:23] RECOVERY - puppet last run on baham is OK: OK: Puppet is currently enabled, last run 22 minutes ago with 0 failures [21:53:03] RECOVERY - puppet last run on sca1003 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [22:31:30] (it should stop doing that now) [22:44:09] Krenair did you restart it by going through ssh? [22:44:17] yes [22:44:20] I guess we should revert https://gerrit.wikimedia.org/r/#/c/323712/ [22:44:25] thats ^^ the cause of it [22:46:24] maybe try to figure out why it was doing that first [22:46:42] We have, it's when we call .disconnect [22:46:56] and then do the .connect inside of the disconnect [22:47:57] we have some pastes from the test bot doing the same thing, https://phabricator.wikimedia.org/P4629 https://phabricator.wikimedia.org/P4629 https://phabricator.wikimedia.org/P4627 [22:48:04] seems like something to do with ping. [22:50:04] well, it succeeds in connecting again and joining channels, but then gets killed for not responding to pings right? [22:50:14] yep, looks like it [22:53:36] I doint know if this https://github.com/martynsmith/node-irc/commit/c91436f98cd49a5aefcd1f7287bef120519f5998 is the cause but it introduces CyclingPingTimer [22:56:48] This looks like something https://github.com/matrix-org/node-irc/commit/45d7ca190477bd545817e8152f2e626c9166cf6b [22:57:44] That ^^ looks like some kind of fix [22:58:09] that just needs backporting to the main node-irc as that one is a fork. [23:32:38] It seems the bot dosent fully disconnect from irc, it disconnects from the channels [23:39:57] FOund a fix, it is .connect that is broken so using .join works