[00:04:09] 10Cloud-Services, 10cloud-services-team (Kanban), 10Operations, 10Patch-For-Review: Reimage labstore1001 and labstore1002 for DRBD storage setup - https://phabricator.wikimedia.org/T158196#3462394 (10ops-monitoring-bot) Completed auto-reimage of hosts: ``` ['labstore1001.eqiad.wmnet'] ``` and were **ALL**... [00:12:20] 10Cloud-Services, 10cloud-services-team (Kanban), 10Operations, 10Patch-For-Review: Reimage labstore1001 and labstore1002 for DRBD storage setup - https://phabricator.wikimedia.org/T158196#3462408 (10ops-monitoring-bot) Script wmf_auto_reimage was launched by madhuvishy on neodymium.eqiad.wmnet for hosts:... [00:13:49] !log servermon deleting all web proxies and VMs. [00:13:51] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Servermon/SAL [00:19:21] PROBLEM - Puppet errors on tools-webgrid-generic-1401 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [00:37:08] 10Cloud-Services, 10cloud-services-team (Kanban), 10Operations, 10Patch-For-Review: Reimage labstore1001 and labstore1002 for DRBD storage setup - https://phabricator.wikimedia.org/T158196#3462453 (10ops-monitoring-bot) Completed auto-reimage of hosts: ``` ['labstore1002.eqiad.wmnet'] ``` and were **ALL**... [01:24:21] RECOVERY - Puppet errors on tools-webgrid-generic-1401 is OK: OK: Less than 1.00% above the threshold [0.0] [01:48:18] PROBLEM - Puppet errors on tools-exec-1402 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [0.0] [02:15:18] 10Toolforge, 10InternetArchiveBot (v1.4), 10User-Zppix: IABot Management interface: Make the login sessions last longer or add an option for "remember me" - https://phabricator.wikimedia.org/T170849#3462572 (10Cyberpower678) 05Open>03Resolved Per the suggestion of @bd808, I have created my own session ha... [02:27:49] 10Toolforge, 10InternetArchiveBot (v1.4), 10User-Zppix: IABot Management interface: Make the login sessions last longer or add an option for "remember me" - https://phabricator.wikimedia.org/T170849#3462577 (10Cyberpower678) 05Resolved>03Open And it seems to not work at all on tool labs. FFS. [02:28:06] 10Toolforge, 10InternetArchiveBot (v1.4), 10User-Zppix: IABot Management interface: Make the login sessions last longer or add an option for "remember me" - https://phabricator.wikimedia.org/T170849#3462579 (10Cyberpower678) Tool is completely broken now. [02:28:16] RECOVERY - Puppet errors on tools-exec-1402 is OK: OK: Less than 1.00% above the threshold [0.0] [02:30:12] PROBLEM - Puppet errors on tools-exec-1408 is CRITICAL: CRITICAL: 66.67% of data above the critical threshold [0.0] [02:46:25] 10VPS-project-Wikistats, 10Patch-For-Review: numbers in rank.php wrong? - https://phabricator.wikimedia.org/T168474#3462612 (10Dzahn) @Danny_B Fixed! http://wikistats.wmflabs.org/rank.php?family=wikt&lang=cs cs.wiktionary 44 172 116 877 I confirmed these numbers look right now. rank 44 in wiktionaries, tot... [02:46:52] 10VPS-project-Wikistats, 10Patch-For-Review: numbers in rank.php wrong? - https://phabricator.wikimedia.org/T168474#3462613 (10Dzahn) 05Open>03Resolved p:05Triage>03Normal [02:59:54] bd808: so I'm getting a 500 error on my tool. I don't know why. When I check the logs, I see: [02:59:57] 2017-07-22 02:58:19: (mod_fastcgi.c.2540) unexpected end-of-file (perhaps the fastcgi process died): pid: 28847 socket: unix:/var/run/lighttpd/php.socket.iabot-1 [02:59:58] 2017-07-22 02:58:19: (mod_fastcgi.c.3326) response not received, request sent: 1010 on socket: unix:/var/run/lighttpd/php.socket.iabot-1 for /iabot/index.php?, closing connection [03:00:13] 2017-07-22 02:58:19: (mod_fastcgi.c.2540) unexpected end-of-file (perhaps the fastcgi process died): pid: 28847 socket: unix:/var/run/lighttpd/php.socket.iabot-1 [03:00:13] 2017-07-22 02:58:19: (mod_fastcgi.c.3326) response not received, request sent: 1010 on socket: unix:/var/run/lighttpd/php.socket.iabot-1 for /iabot/index.php?, closing connection [03:03:47] Cyberpower678: I would suggest restarting the webservice. I'm on 1G at a cabin in the mountains or I'd log in and see if I could figure out more. :) [03:04:11] bd808: I already tried that. :/ [03:04:34] I assume it has something to do with my update to have the session handler store sessions in my DB. [03:04:56] But for some reason it's triggering a server error, and not a very helpful one. :/ [03:05:09] Even worse, I can't replicate this error on my machine. [03:05:15] RECOVERY - Puppet errors on tools-exec-1408 is OK: OK: Less than 1.00% above the threshold [0.0] [03:06:08] Fortunately I smartly designed the code so I can just unload the DB from the handler and it will default to PHP's default. [03:11:05] bd808: Maybe you can help tomorrow or know someone who can. I deactivated the DB session handler. It works perfectly on my machine, but seems to cause a 500 error on toolforge. With it deactivated, the tool is working again, so I can breath, but... [03:12:40] 10Toolforge, 10InternetArchiveBot (v1.4), 10User-Zppix: IABot Management interface: Make the login sessions last longer or add an option for "remember me" - https://phabricator.wikimedia.org/T170849#3462626 (10Cyberpower678) I had to switch back to PHP's default handler. The DB handler doesn't seem to like... [03:50:54] PROBLEM - Puppet errors on tools-exec-1436 is CRITICAL: CRITICAL: 40.00% of data above the critical threshold [0.0] [04:10:55] RECOVERY - Puppet errors on tools-exec-1436 is OK: OK: Less than 1.00% above the threshold [0.0] [04:55:26] 10VPS-project-Wikistats: Add Assamese Wikisource to Wikistats - https://phabricator.wikimedia.org/T164240#3462711 (10Dzahn) a:03Dzahn [05:22:18] 10VPS-project-Wikistats: Add Assamese Wikisource to Wikistats - https://phabricator.wikimedia.org/T164240#3462719 (10Dzahn) 05Open>03Resolved @Dcljr Thanks for reporting and sorry for the delay, i need better notification settings for this tag. also feel free to add me personally if you see an issue again.... [06:43:05] PROBLEM - Puppet errors on tools-webgrid-lighttpd-1427 is CRITICAL: CRITICAL: 44.44% of data above the critical threshold [0.0] [07:18:04] RECOVERY - Puppet errors on tools-webgrid-lighttpd-1427 is OK: OK: Less than 1.00% above the threshold [0.0] [08:08:18] 10cloud-services-team (Kanban), 10Project-Admins, 10User-bd808: Rename and update Cloud Services Phabricator projects - https://phabricator.wikimedia.org/T167244#3462797 (10MarcoAurelio) @bd808 Do we plan to migrate/change repository names on Phabricator, Gerrit and GitHub as well? For example {rTSTW} // lab... [08:12:29] PROBLEM - Puppet errors on tools-webgrid-lighttpd-1416 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [0.0] [08:47:28] RECOVERY - Puppet errors on tools-webgrid-lighttpd-1416 is OK: OK: Less than 1.00% above the threshold [0.0] [09:15:21] PROBLEM - Puppet errors on tools-webgrid-generic-1401 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [09:55:22] RECOVERY - Puppet errors on tools-webgrid-generic-1401 is OK: OK: Less than 1.00% above the threshold [0.0] [10:19:02] hey [10:19:15] I don't seem to be able to ssh into puppet-ema.puppet.eqiad.wmflabs anymore [10:20:46] is anyone else having trouble authenticating to labs instances? [10:38:46] ema: there were some ldap issues last week, but those should be resolved. Let me take a look... [10:39:39] valhallasw`cloud: thanks. Meanwhile I've created a new instance (puppet-ema-2) and I can access it without problems [10:40:37] Yeah, it's the ldap issue. I guess this is a self-hosted puppetmaster? [10:40:49] correct [10:41:43] can I safely pull updates from operations-puppet? [10:43:25] you mean onto puppet-ema? If so, yes [10:43:34] OK :-) [10:45:32] !log puppet ran git pull && git stash && git rebase origin/production on puppet-ema.puppet.eqiad.wmflabs && git stash apply on puppet-ema ; had to manually merge modules/base/manifests/kernel.pp [10:45:35] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Puppet/SAL [10:48:46] !log puppet puppet agent -tv is still broken; ran git stash, ran puppet agent -tv. ssh-user-key-lookup now works again, which should allow ema to login and clean up the puppet repository :-) [10:48:47] Logged the message at https://wikitech.wikimedia.org/wiki/Nova_Resource:Puppet/SAL [10:49:06] ema: ^ there are now some stashed changes in /etc/puppet/manifests, but you should be able to login again :-) [10:50:47] valhallasw`cloud: thanks, still no luck though [10:52:30] one of the services needs restarting [10:56:02] ema try restarting it :) [10:56:12] that will restart the service [10:59:45] ema: ah, I might need to restart nslcd. Waiting for the cache to expire should also work [10:59:55] I'll be back in 30 mins or so to take a look [11:00:12] valhallasw`cloud, paladox: yeah rebooting the instance did the trick. Thanks! [11:00:18] Your welcome :) [19:54:33] 10Wikibugs, 10XTools, 10Patch-For-Review: Update XTools on Wikibugs - https://phabricator.wikimedia.org/T171265#3463252 (10Matthewrbowker) Wikibugs still is not reporting in the channel, though the patch was merged. [20:50:27] PROBLEM - Puppet errors on tools-bastion-03 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [0.0] [20:54:46] 10Tool-Global-user-contributions, 10Collaboration-Team-Triage, 10Flow: Add Flow contributions to GUC - https://phabricator.wikimedia.org/T114777#3463302 (10MusikAnimal) [20:55:22] 10Tool-Global-user-contributions, 10Collaboration-Team-Triage, 10Flow: Add Flow contributions to GUC - https://phabricator.wikimedia.org/T114777#1705954 (10MusikAnimal) There is now a dedicated ticket for XTools at T136950, so changing this one to be just for GUC [20:55:38] 10Tool-Global-user-contributions, 10Collaboration-Team-Triage, 10Flow: Add Flow contributions to GUC - https://phabricator.wikimedia.org/T114777#3463309 (10MusikAnimal) [20:55:57] 10Tool-Global-user-contributions, 10Collaboration-Team-Triage, 10Flow: Add Flow contributions to GUC - https://phabricator.wikimedia.org/T114777#1705954 (10MusikAnimal) [21:25:25] RECOVERY - Puppet errors on tools-bastion-03 is OK: OK: Less than 1.00% above the threshold [0.0] [21:59:58] 10Tool-Global-user-contributions: Add Flow contributions to GUC - https://phabricator.wikimedia.org/T114777#3463345 (10Krinkle) Reducing it further. This task is to track the end result in GUC. The Flow-side of things is {T114777}. Once that is resolved, any necessary changes can be made on the GUC side. [22:01:41] bd808: are available? [22:02:08] *when [22:02:17] *when are you available? [22:52:46] bd808: Hi, good evening [22:52:50] I am done with the docs [22:54:52] bd808: What are the next steps please? [23:02:47] Cyberpower678: d3r1ck: It's a Saturday, fellas. [23:03:15] Niharika: that's no excuse. ;) [23:06:00] Niharika: :D. I agree with you Niharika [23:06:28] Anyway, I am off to bed now. Will ping B. Davis on week days :) [23:06:54] Cyberpower678: It most definitely is. Ask back on Monday. :) [23:07:11] d3r1ck: Night! [23:07:44] Niharika: (y) [23:54:38] Niharika: :p