[00:07:18] (03PS13) 10Mobrovac: RESTBase-Cassandra: Add the topk reporter [puppet] - 10https://gerrit.wikimedia.org/r/328660 (https://phabricator.wikimedia.org/T147366) [00:12:08] RECOVERY - puppet last run on analytics1001 is OK: OK: Puppet is currently enabled, last run 23 seconds ago with 0 failures [00:32:54] (03CR) 10Mattflaschen: [C: 031] "These look good in combination." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/334025 (https://phabricator.wikimedia.org/T155997) (owner: 10AndyRussG) [00:35:53] (03CR) 10Mattflaschen: [C: 031] "I guess we can deploy this with the train on Tuesday. It doesn't seem like there's a need to cherry-pick it to 1.29.0-wmf.9. Since it's " [mediawiki-config] - 10https://gerrit.wikimedia.org/r/334025 (https://phabricator.wikimedia.org/T155997) (owner: 10AndyRussG) [00:37:23] Vito: https://de.wikipedia.org/w/index.php?title=Wikipedia:L%C3%B6schkandidaten&action=info <- will not be solved, is there a mediawiki bug? action info does not work with pages of namespace 4. Do you anything about it? [00:39:02] I think I'm not able to assist you doctaxon [00:39:17] thank you, do you know anyone? [00:39:44] though it seems to happen only at de.wiki [00:39:52] so it might be something related to localisation [00:40:17] the best way is opening a phab task [00:40:59] (03PS14) 10Mobrovac: RESTBase-Cassandra: Add the topk reporter [puppet] - 10https://gerrit.wikimedia.org/r/328660 (https://phabricator.wikimedia.org/T147366) [00:41:04] oh wait [00:41:28] seems to be a performance issue doctaxon [00:41:58] performance issue [00:41:59] that page has 12 161 revs [00:42:13] so generating info seems to be an heavy task [00:42:50] ah okay [00:43:14] you can still report it [00:43:17] i'll try another page in namespace 4 [00:43:26] I tried and it seems to work [00:43:42] yes it does [00:44:04] while this doesnt https://it.wikipedia.org/w/index.php?title=Wikipedia:Pagina_delle_prove&action=info [00:44:15] since it has >11k revs [00:44:24] oh it eventually worked [00:44:42] oh wait [00:44:59] the page above has tons of subpages? [00:45:21] yes, right [00:45:39] uhm I just tried with a 88k subpages page [00:45:41] and it worked [00:45:55] there's something truly heavy in that particular page [00:46:12] :( [00:46:36] tons of templates? [00:47:13] very much subpages [00:47:17] and revisions [00:47:39] 12k revs [00:47:53] though I tested it on 88k subpages [00:47:59] Request from 84.155.142.69 via cp3043 cp3043, Varnish XID 754445378 [00:48:02] Error: 503, Backend fetch failed at Sat, 28 Jan 2017 00:46:46 GMT [00:48:36] it returned me the same IP address [00:48:42] which it shouldn't [00:49:08] ??? [00:50:10] 84.155.142.69 is not my ip [00:50:38] (03PS15) 10Mobrovac: RESTBase-Cassandra: Add the topk reporter [puppet] - 10https://gerrit.wikimedia.org/r/328660 (https://phabricator.wikimedia.org/T147366) [00:50:58] maybe any intern ip [00:51:10] it's a German ip [00:51:12] fetching the result to transport [00:51:35] if it's not yours I think there's something truly broken in cache for that particular page [00:52:33] yes, it's mine [00:57:42] (03PS16) 10Mobrovac: RESTBase-Cassandra: Add the topk reporter [puppet] - 10https://gerrit.wikimedia.org/r/328660 (https://phabricator.wikimedia.org/T147366) [00:58:39] task opened [01:02:21] (03PS17) 10Mobrovac: RESTBase-Cassandra: Add the topk reporter [puppet] - 10https://gerrit.wikimedia.org/r/328660 (https://phabricator.wikimedia.org/T147366) [01:04:57] (03CR) 10Mobrovac: "PCC (finally) OK - https://puppet-compiler.wmflabs.org/5272/" [puppet] - 10https://gerrit.wikimedia.org/r/328660 (https://phabricator.wikimedia.org/T147366) (owner: 10Mobrovac) [01:05:38] PROBLEM - Start a job and verify on Precise on checker.tools.wmflabs.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 SERVICE UNAVAILABLE - string OK not found on http://checker.tools.wmflabs.org:80/grid/start/precise - 185 bytes in 0.435 second response time [01:06:38] RECOVERY - Start a job and verify on Precise on checker.tools.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 166 bytes in 1.000 second response time [01:08:38] PROBLEM - Start a job and verify on Trusty on checker.tools.wmflabs.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 SERVICE UNAVAILABLE - string OK not found on http://checker.tools.wmflabs.org:80/grid/start/trusty - 185 bytes in 0.515 second response time [01:09:38] RECOVERY - Start a job and verify on Trusty on checker.tools.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 166 bytes in 0.863 second response time [01:18:56] (03CR) 10Volans: [C: 031] "@godog please have a look you too. I'm still not fully convinced about the necessity of the crond module, but apart that looks in good sha" [puppet] - 10https://gerrit.wikimedia.org/r/328660 (https://phabricator.wikimedia.org/T147366) (owner: 10Mobrovac) [01:25:38] PROBLEM - Start a job and verify on Trusty on checker.tools.wmflabs.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 SERVICE UNAVAILABLE - string OK not found on http://checker.tools.wmflabs.org:80/grid/start/trusty - 185 bytes in 0.460 second response time [01:26:38] RECOVERY - Start a job and verify on Trusty on checker.tools.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 166 bytes in 1.586 second response time [01:35:08] PROBLEM - Postgres Replication Lag on maps1002 is CRITICAL: CRITICAL - Rep Delay is: 1828.508832 Seconds [01:36:08] RECOVERY - Postgres Replication Lag on maps1002 is OK: OK - Rep Delay is: 44.367921 Seconds [01:39:08] thx Vito for T156537 [01:39:09] T156537: Unable to load page properties of a certain page - https://phabricator.wikimedia.org/T156537 [01:42:38] PROBLEM - Start a job and verify on Precise on checker.tools.wmflabs.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 SERVICE UNAVAILABLE - string OK not found on http://checker.tools.wmflabs.org:80/grid/start/precise - 185 bytes in 0.442 second response time [01:45:08] 06Operations, 10ArchCom-RfC, 06Commons, 10MediaWiki-File-management, and 14 others: Define an official thumb API - https://phabricator.wikimedia.org/T66214#2978964 (10Tgr) [01:45:18] PROBLEM - puppet last run on analytics1057 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [01:45:26] 06Operations, 10ArchCom-RfC, 06Commons, 10MediaWiki-File-management, and 14 others: Define an official thumb API - https://phabricator.wikimedia.org/T66214#2782384 (10Tgr) [01:45:38] RECOVERY - Start a job and verify on Precise on checker.tools.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 166 bytes in 1.930 second response time [01:58:18] PROBLEM - puppet last run on druid1002 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [02:05:38] PROBLEM - Start a job and verify on Precise on checker.tools.wmflabs.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 SERVICE UNAVAILABLE - string OK not found on http://checker.tools.wmflabs.org:80/grid/start/precise - 185 bytes in 0.339 second response time [02:06:38] PROBLEM - Start a job and verify on Trusty on checker.tools.wmflabs.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 SERVICE UNAVAILABLE - string OK not found on http://checker.tools.wmflabs.org:80/grid/start/trusty - 185 bytes in 0.767 second response time [02:08:38] RECOVERY - Start a job and verify on Trusty on checker.tools.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 166 bytes in 1.288 second response time [02:09:38] RECOVERY - Start a job and verify on Precise on checker.tools.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 166 bytes in 0.913 second response time [02:14:18] RECOVERY - puppet last run on analytics1057 is OK: OK: Puppet is currently enabled, last run 56 seconds ago with 0 failures [02:18:02] !log l10nupdate@tin scap sync-l10n completed (1.29.0-wmf.9) (duration: 05m 39s) [02:18:06] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [02:22:49] !log l10nupdate@tin ResourceLoader cache refresh completed at Sat Jan 28 02:22:49 UTC 2017 (duration 4m 47s) [02:22:52] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [02:26:18] RECOVERY - puppet last run on druid1002 is OK: OK: Puppet is currently enabled, last run 45 seconds ago with 0 failures [02:29:38] PROBLEM - Start a job and verify on Trusty on checker.tools.wmflabs.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 SERVICE UNAVAILABLE - string OK not found on http://checker.tools.wmflabs.org:80/grid/start/trusty - 185 bytes in 0.713 second response time [02:30:38] RECOVERY - Start a job and verify on Trusty on checker.tools.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 166 bytes in 0.564 second response time [02:50:38] PROBLEM - Start a job and verify on Precise on checker.tools.wmflabs.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 SERVICE UNAVAILABLE - string OK not found on http://checker.tools.wmflabs.org:80/grid/start/precise - 185 bytes in 0.755 second response time [02:51:35] (03PS1) 10Andrew Bogott: Keystone: Define some stopped services on the spare host [puppet] - 10https://gerrit.wikimedia.org/r/334745 [02:51:38] RECOVERY - Start a job and verify on Precise on checker.tools.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 166 bytes in 1.017 second response time [02:53:12] (03CR) 10Andrew Bogott: [C: 032] Keystone: Define some stopped services on the spare host [puppet] - 10https://gerrit.wikimedia.org/r/334745 (owner: 10Andrew Bogott) [02:53:28] PROBLEM - All k8s worker nodes are healthy on checker.tools.wmflabs.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 SERVICE UNAVAILABLE - string OK not found on http://checker.tools.wmflabs.org:80/k8s/nodes/ready - 185 bytes in 0.240 second response time [02:55:48] RECOVERY - puppet last run on labcontrol1002 is OK: OK: Puppet is currently enabled, last run 27 seconds ago with 0 failures [03:15:38] PROBLEM - Start a job and verify on Trusty on checker.tools.wmflabs.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 SERVICE UNAVAILABLE - string OK not found on http://checker.tools.wmflabs.org:80/grid/start/trusty - 185 bytes in 1.926 second response time [03:16:22] (03CR) 10Volans: "I vote for having the linter in the submodules, this way the errors can be catched as soon as the CR is sent against the submodule, not wh" [puppet] - 10https://gerrit.wikimedia.org/r/331239 (https://phabricator.wikimedia.org/T154915) (owner: 10Hashar) [03:16:38] RECOVERY - Start a job and verify on Trusty on checker.tools.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 166 bytes in 1.447 second response time [03:21:28] RECOVERY - All k8s worker nodes are healthy on checker.tools.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 166 bytes in 0.132 second response time [03:28:38] PROBLEM - Start a job and verify on Precise on checker.tools.wmflabs.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 SERVICE UNAVAILABLE - string OK not found on http://checker.tools.wmflabs.org:80/grid/start/precise - 185 bytes in 1.823 second response time [03:31:38] RECOVERY - Start a job and verify on Precise on checker.tools.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 166 bytes in 1.727 second response time [03:33:57] (03PS3) 10TTO: Enable expiring user groups on beta [mediawiki-config] - 10https://gerrit.wikimedia.org/r/333652 [03:40:38] PROBLEM - puppet last run on cp3044 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [03:51:38] PROBLEM - Start a job and verify on Precise on checker.tools.wmflabs.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 SERVICE UNAVAILABLE - string OK not found on http://checker.tools.wmflabs.org:80/grid/start/precise - 185 bytes in 1.454 second response time [03:54:38] RECOVERY - Start a job and verify on Precise on checker.tools.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 166 bytes in 1.036 second response time [04:08:38] RECOVERY - puppet last run on cp3044 is OK: OK: Puppet is currently enabled, last run 17 seconds ago with 0 failures [04:19:39] PROBLEM - puppet last run on ms-fe3002 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [04:19:48] 06Operations, 13Patch-For-Review: fix log reading permissions for dc-ops admin group - https://phabricator.wikimedia.org/T156529#2979060 (10Peachey88) [04:20:39] 06Operations, 10hardware-requests: Replace bast3001 - https://phabricator.wikimedia.org/T156506#2979061 (10Peachey88) [04:48:38] RECOVERY - puppet last run on ms-fe3002 is OK: OK: Puppet is currently enabled, last run 27 seconds ago with 0 failures [05:25:38] PROBLEM - puppet last run on labtestcontrol2001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [05:45:58] RECOVERY - cassandra-a CQL 10.64.0.213:9042 on aqs1007 is OK: TCP OK - 0.000 second response time on 10.64.0.213 port 9042 [05:54:38] RECOVERY - puppet last run on labtestcontrol2001 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [06:49:18] PROBLEM - Check HHVM threads for leakage on mw1168 is CRITICAL: CRITICAL: HHVM has more than double threads running or queued than apache has busy workers [07:01:18] RECOVERY - Check HHVM threads for leakage on mw1168 is OK: OK [07:05:02] 06Operations, 06Parsing-Team, 13Patch-For-Review: Visual-diff testreduce make ruthenium unresponsive - https://phabricator.wikimedia.org/T156177#2979088 (10ssastry) >>! In T156177#2975028, @mobrovac wrote: > Thnx @Volans for taking care of this and keeping tabs on it :) @ssastry, please let us know once you... [07:22:42] PROBLEM - LVS HTTP IPv4 on zotero.svc.eqiad.wmnet is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:22:48] PROBLEM - citoid endpoints health on scb1004 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [07:22:48] PROBLEM - citoid endpoints health on scb1003 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [07:22:48] PROBLEM - citoid endpoints health on scb1001 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [07:22:48] PROBLEM - citoid endpoints health on scb1002 is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. [07:22:48] PROBLEM - zotero on sca1003 is CRITICAL: CRITICAL - Socket timeout after 10 seconds [07:23:48] RECOVERY - citoid endpoints health on scb1003 is OK: All endpoints are healthy [07:23:48] RECOVERY - citoid endpoints health on scb1002 is OK: All endpoints are healthy [07:23:48] RECOVERY - citoid endpoints health on scb1001 is OK: All endpoints are healthy [07:24:31] RECOVERY - LVS HTTP IPv4 on zotero.svc.eqiad.wmnet is OK: HTTP OK: HTTP/1.0 200 OK - 62 bytes in 0.006 second response time [07:24:38] RECOVERY - citoid endpoints health on scb1004 is OK: All endpoints are healthy [07:24:38] RECOVERY - zotero on sca1003 is OK: HTTP OK: HTTP/1.0 200 OK - 62 bytes in 0.011 second response time [07:26:02] mhh... seems recovered already [07:26:11] wha [07:28:00] fine [07:51:08] PROBLEM - puppet last run on analytics1051 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [07:56:20] (03CR) 10Hashar: [C: 04-1] "-1 pending related changes in submodules https://gerrit.wikimedia.org/r/#/q/project:%255Eoperations/puppet/.*+owner:hashar" [puppet] - 10https://gerrit.wikimedia.org/r/331239 (https://phabricator.wikimedia.org/T154915) (owner: 10Hashar) [08:19:08] RECOVERY - puppet last run on analytics1051 is OK: OK: Puppet is currently enabled, last run 29 seconds ago with 0 failures [09:47:04] (03PS4) 10Juniorsys: Linting fixes (Multiple modules) [puppet] - 10https://gerrit.wikimedia.org/r/334276 (https://phabricator.wikimedia.org/T93645) [09:47:09] (03PS4) 10Juniorsys: deployment: Linting fixes [puppet] - 10https://gerrit.wikimedia.org/r/334278 (https://phabricator.wikimedia.org/T93645) [09:47:13] (03PS4) 10Juniorsys: dnsrecursor: Linting fixes [puppet] - 10https://gerrit.wikimedia.org/r/334279 (https://phabricator.wikimedia.org/T93645) [09:47:17] (03PS4) 10Juniorsys: etcd: Linting fixes [puppet] - 10https://gerrit.wikimedia.org/r/334282 (https://phabricator.wikimedia.org/T93645) [09:47:21] (03PS4) 10Juniorsys: eventlogging/eventstreams: Linting fixes [puppet] - 10https://gerrit.wikimedia.org/r/334283 (https://phabricator.wikimedia.org/T93645) [09:47:25] (03PS4) 10Juniorsys: extdist: Linting fixes [puppet] - 10https://gerrit.wikimedia.org/r/334284 [09:47:31] (03PS4) 10Juniorsys: jupterhub/keyholder: Linting fixes [puppet] - 10https://gerrit.wikimedia.org/r/334287 (https://phabricator.wikimedia.org/T93645) [09:47:36] (03PS4) 10Juniorsys: labs modules linting changes [puppet] - 10https://gerrit.wikimedia.org/r/334290 (https://phabricator.wikimedia.org/T93645) [09:47:43] (03PS4) 10Juniorsys: ldap: Linting fixes [puppet] - 10https://gerrit.wikimedia.org/r/334291 (https://phabricator.wikimedia.org/T93645) [09:47:52] (03PS4) 10Juniorsys: librenms/locales/logstash/lshell linting changes [puppet] - 10https://gerrit.wikimedia.org/r/334293 (https://phabricator.wikimedia.org/T93645) [09:47:59] (03PS4) 10Juniorsys: lvm/lvs: Linting changes [puppet] - 10https://gerrit.wikimedia.org/r/334294 (https://phabricator.wikimedia.org/T93645) [09:48:07] (03PS4) 10Juniorsys: Linting changes (multiple) [puppet] - 10https://gerrit.wikimedia.org/r/334295 (https://phabricator.wikimedia.org/T93645) [09:48:16] (03PS4) 10Juniorsys: mysql: Linting changes [puppet] - 10https://gerrit.wikimedia.org/r/334298 (https://phabricator.wikimedia.org/T93645) [09:48:24] (03PS4) 10Juniorsys: Linting changes (multiple) [puppet] - 10https://gerrit.wikimedia.org/r/334299 (https://phabricator.wikimedia.org/T93645) [09:48:29] (03PS4) 10Juniorsys: ores/otrs/package_builder: Linting changes [puppet] - 10https://gerrit.wikimedia.org/r/334300 (https://phabricator.wikimedia.org/T93645) [09:48:37] (03PS4) 10Juniorsys: openstack: Linting changes [puppet] - 10https://gerrit.wikimedia.org/r/334301 (https://phabricator.wikimedia.org/T93645) [09:48:42] (03PS4) 10Juniorsys: profile linting changes [puppet] - 10https://gerrit.wikimedia.org/r/334303 (https://phabricator.wikimedia.org/T93645) [09:48:47] (03PS4) 10Juniorsys: prometheus: Linting changes [puppet] - 10https://gerrit.wikimedia.org/r/334306 (https://phabricator.wikimedia.org/T93645) [09:48:53] (03PS4) 10Juniorsys: puppet/puppet_compiler: Linting changes [puppet] - 10https://gerrit.wikimedia.org/r/334307 (https://phabricator.wikimedia.org/T93645) [09:48:58] (03PS4) 10Juniorsys: planet/pmacct/programdashboard/pybal lint changes [puppet] - 10https://gerrit.wikimedia.org/r/334308 (https://phabricator.wikimedia.org/T93645) [09:49:03] (03PS4) 10Juniorsys: quarry: Linting changes [puppet] - 10https://gerrit.wikimedia.org/r/334309 (https://phabricator.wikimedia.org/T93645) [09:49:09] (03PS4) 10Juniorsys: role: Linting changes (backup,bastionhost+others) [puppet] - 10https://gerrit.wikimedia.org/r/334310 (https://phabricator.wikimedia.org/T93645) [09:49:17] (03PS4) 10Juniorsys: redis: Linting changes [puppet] - 10https://gerrit.wikimedia.org/r/334311 (https://phabricator.wikimedia.org/T93645) [09:49:25] (03PS4) 10Juniorsys: Linting fixes (multiple modules) [puppet] - 10https://gerrit.wikimedia.org/r/334317 (https://phabricator.wikimedia.org/T93645) [09:49:32] (03PS4) 10Juniorsys: graphoid/gridengine/grub/haproxy/hhvm lint fixes [puppet] - 10https://gerrit.wikimedia.org/r/334319 (https://phabricator.wikimedia.org/T93645) [09:49:38] (03PS4) 10Juniorsys: ifttt/imagemagick/initramfs/interface lint fixes [puppet] - 10https://gerrit.wikimedia.org/r/334320 (https://phabricator.wikimedia.org/T93645) [09:49:43] (03PS3) 10Juniorsys: Puppet style: Use one line per include/require [puppet] - 10https://gerrit.wikimedia.org/r/334322 [09:53:56] 06Operations, 10Internet-Archive, 06Offline-Working-Group: Create backups of Wikimedia content in diverse geographic places - https://phabricator.wikimedia.org/T156544#2979167 (10Qgil) I could not find a project tag directly related to this request, but I hope any of these groups might have a better idea. [09:55:18] (03CR) 10jerkins-bot: [V: 04-1] Linting changes (multiple) [puppet] - 10https://gerrit.wikimedia.org/r/334299 (https://phabricator.wikimedia.org/T93645) (owner: 10Juniorsys) [09:57:38] PROBLEM - puppet last run on labnet1002 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [09:58:10] 06Operations, 10Internet-Archive, 06Offline-Working-Group: Create backups of Wikimedia content in diverse geographic places - https://phabricator.wikimedia.org/T156544#2979171 (10Qgil) [10:20:13] (03PS1) 10Elukey: Enable AQS aqs1007-b cassandra instance [puppet] - 10https://gerrit.wikimedia.org/r/334753 (https://phabricator.wikimedia.org/T155654) [10:25:38] RECOVERY - puppet last run on labnet1002 is OK: OK: Puppet is currently enabled, last run 24 seconds ago with 0 failures [10:27:31] (03CR) 10Elukey: "https://puppet-compiler.wmflabs.org/5273/aqs1007.eqiad.wmnet/" [puppet] - 10https://gerrit.wikimedia.org/r/334753 (https://phabricator.wikimedia.org/T155654) (owner: 10Elukey) [11:53:28] PROBLEM - All k8s worker nodes are healthy on checker.tools.wmflabs.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 SERVICE UNAVAILABLE - string OK not found on http://checker.tools.wmflabs.org:80/k8s/nodes/ready - 185 bytes in 0.140 second response time [12:20:28] RECOVERY - All k8s worker nodes are healthy on checker.tools.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 166 bytes in 0.123 second response time [12:55:18] PROBLEM - puppet last run on stat1004 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [13:05:09] 06Operations, 10Internet-Archive, 06Offline-Working-Group: Create backups of Wikimedia content in diverse geographic places - https://phabricator.wikimedia.org/T156544#2979277 (10abian) Personally, I would like Wikimedia chapters and the Wikimedia Foundation to cooperate more. In particular, some Wikimedia c... [13:12:43] (03CR) 10Hashar: [C: 031] jupterhub/keyholder: Linting fixes [puppet] - 10https://gerrit.wikimedia.org/r/334287 (https://phabricator.wikimedia.org/T93645) (owner: 10Juniorsys) [13:14:18] (03CR) 10Hashar: [C: 04-1] dnsrecursor: Linting fixes (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/334279 (https://phabricator.wikimedia.org/T93645) (owner: 10Juniorsys) [13:23:18] RECOVERY - puppet last run on stat1004 is OK: OK: Puppet is currently enabled, last run 11 seconds ago with 0 failures [13:29:18] PROBLEM - puppet last run on rdb1005 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [13:58:18] RECOVERY - puppet last run on rdb1005 is OK: OK: Puppet is currently enabled, last run 55 seconds ago with 0 failures [14:24:28] PROBLEM - puppet last run on db1063 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [14:52:28] RECOVERY - puppet last run on db1063 is OK: OK: Puppet is currently enabled, last run 6 seconds ago with 0 failures [14:52:53] 06Operations, 10Wikimedia-General-or-Unknown: Unable to load page properties of a certain page - https://phabricator.wikimedia.org/T156537#2979399 (10zhuyifei1999) Added #operations since their assistance may be needed to debug page hanging. [14:54:52] PROBLEM - MariaDB disk space on labsdb1001 is CRITICAL: DISK CRITICAL - free space: /srv 185576 MB (5% inode=99%) [14:56:11] ^ checking [14:57:09] I'm here [14:57:26] hey jynus [14:57:47] -rw-rw---- 1 mysql mysql 69G Jan 28 14:34 #sql_1a3a_64.MAD [14:57:54] lovely temp table [14:58:55] <_joe_> marostegui: ehehh I was commenting elsewhere about that [14:58:56] <_joe_> :P [14:59:02] not sure what srvuserdata is [14:59:11] I would drop it, extend the volume [14:59:35] <_joe_> it has 1.1 T of things on it [14:59:42] <_joe_> bbl [14:59:49] yeah, but last touched aug 2014 [15:01:26] yeah, there is nothing recent there [15:02:16] (03PS1) 10Urbanecm: Increase default thumb size to 250px at nowiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/334787 (https://phabricator.wikimedia.org/T155892) [15:02:48] it would make sense if user data was there, but it is not [15:04:50] then there is https://phabricator.wikimedia.org/T132431 [15:05:38] I would drop s51187__xtools_tmp [15:05:43] 300GB [15:05:50] warned 1 year ago [15:07:36] but s51187__xtools_tmp has activity today, or at least some files have been touched recently [15:08:25] read https://phabricator.wikimedia.org/T133321 [15:10:26] they are all myisam tables, we move them to /srvuserdata [15:10:59] it makes no sense a user has more data there than enwiki and wikidata combined [15:11:05] yeah, that is true [15:11:10] 270G is completely mad [15:12:15] "The table is supposed to be dropped right after it's created and processed" [15:12:25] we drop what it is there [15:12:36] if things continue, we block the user [15:13:42] we can drop all the tables in s51187__xtools_tmp yeah [15:13:47] let's do that? [15:14:24] 6992 rows in set (0.03 sec) [15:14:46] let me do that [15:15:21] ok [15:19:43] then if you will do it, I have friends at home, going to log off. Call me if you need me please [15:19:48] sure [15:20:16] thanks! [15:22:21] 06Operations, 10Traffic, 10Wikimedia-General-or-Unknown, 07Browser-Support-Internet-Explorer, 07Upstream: Visting [[c:File:FEZ_trial_gameplay_HD.webm]] in IE11 shows errors in developer console about insecure data:image/png;base64 "URL" - https://phabricator.wikimedia.org/T148595#2727296 (10zhuyifei1999)... [15:22:52] RECOVERY - MariaDB disk space on labsdb1001 is OK: DISK OK [15:53:19] PROBLEM - puppet last run on mw1297 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [16:16:58] PROBLEM - puppet last run on elastic1018 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [16:23:18] RECOVERY - puppet last run on mw1297 is OK: OK: Puppet is currently enabled, last run 50 seconds ago with 0 failures [16:44:58] RECOVERY - puppet last run on elastic1018 is OK: OK: Puppet is currently enabled, last run 20 seconds ago with 0 failures [18:15:13] 06Operations, 10Traffic, 10Wikimedia-General-or-Unknown, 07Browser-Support-Internet-Explorer, 07Upstream: Visting [[c:File:FEZ_trial_gameplay_HD.webm]] in IE11 shows errors in developer console about insecure data:image/png;base64 "URL" - https://phabricator.wikimedia.org/T148595#2979575 (10Paladox) Not... [18:39:47] 06Operations, 10Wikimedia-General-or-Unknown: Unable to load page properties of a certain page - https://phabricator.wikimedia.org/T156537#2979623 (10Aklapper) I get a 504 Gateway Time-out instead of the 503 in the attachment of this task: ``` $:andre\> curl -v https://de.wikipedia.org/w/index.php?title=Wikipe... [18:40:38] 06Operations, 10Wikimedia-General-or-Unknown: 504 Gateway Time-out on https://de.wikipedia.org/w/index.php?title=Wikipedia:L%C3%B6schkandidaten&action=info - https://phabricator.wikimedia.org/T156537#2979629 (10Aklapper) [18:47:08] PROBLEM - puppet last run on mw1298 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [18:48:09] 06Operations, 10Wikimedia-General-or-Unknown: 504 Gateway Time-out on https://de.wikipedia.org/w/index.php?title=Wikipedia:L%C3%B6schkandidaten&action=info - https://phabricator.wikimedia.org/T156537#2978892 (10jcrespo) The query is most likely: ``` SELECT /* WikiPage::getOldestRevision */ rev_id,rev_page,rev... [18:55:29] 06Operations, 10Wikimedia-General-or-Unknown: 504 Gateway Time-out on https://de.wikipedia.org/w/index.php?title=Wikipedia:L%C3%B6schkandidaten&action=info - https://phabricator.wikimedia.org/T156537#2979642 (10jcrespo) This makes it faster: ``` SELECT /* WikiPage::getOldestRevision */ rev_id,rev_page,rev_text... [19:06:28] PROBLEM - puppet last run on mw1272 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [19:17:08] RECOVERY - puppet last run on mw1298 is OK: OK: Puppet is currently enabled, last run 43 seconds ago with 0 failures [19:34:14] 06Operations, 10Internet-Archive, 06Offline-Working-Group: Create backups of Wikimedia content in diverse geographic places - https://phabricator.wikimedia.org/T156544#2979128 (10jeblad) Keeping an offline copy should be an obvious duty for the larger chapters. [19:34:28] RECOVERY - puppet last run on mw1272 is OK: OK: Puppet is currently enabled, last run 21 seconds ago with 0 failures [20:17:28] PROBLEM - puppet last run on mw1294 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [20:45:28] RECOVERY - puppet last run on mw1294 is OK: OK: Puppet is currently enabled, last run 14 seconds ago with 0 failures [20:57:55] 06Operations, 10DBA, 10Gerrit, 13Patch-For-Review, 07Upstream: Gerrit shows HTTP 500 error when pasting extended unicode characters - https://phabricator.wikimedia.org/T145885#2979749 (10Paladox) You can also have emoji's in your commit msg too :) http://gerrit-new.wmflabs.org/#/c/24/ [21:57:19] (03CR) 10Legoktm: extdist: Linting fixes (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/334284 (owner: 10Juniorsys) [23:02:38] PROBLEM - puppet last run on puppetmaster2001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [23:30:38] RECOVERY - puppet last run on puppetmaster2001 is OK: OK: Puppet is currently enabled, last run 19 seconds ago with 0 failures