[00:18:01] 10Operations, 10media-storage: uploads.wm.o commons archive 20170615014039!Adsalm.webm visible despite file deleted on Commons - https://phabricator.wikimedia.org/T168002#3356390 (10zhuyifei1999) Actually, that file belongs to https://commons.wikimedia.org/wiki/File:20170615014039!Younes.webm, and it's gone af... [00:33:04] PROBLEM - Text HTTP 5xx reqs/min on graphite1001 is CRITICAL: CRITICAL: 22.22% of data above the critical threshold [1000.0] [00:37:04] RECOVERY - Text HTTP 5xx reqs/min on graphite1001 is OK: OK: Less than 1.00% above the threshold [250.0] [00:59:38] (03PS1) 10Dzahn: wikistats: add XML dumps for all tables, fix db access [puppet] - 10https://gerrit.wikimedia.org/r/359639 [01:01:41] (03CR) 10Dzahn: [C: 032] "labs-only (not stats.wm.org), dumping stats files" [puppet] - 10https://gerrit.wikimedia.org/r/359639 (owner: 10Dzahn) [01:03:16] 10Operations, 10Wikimedia-Site-requests: Global rename of Smuconlaw → Sgconlaw: supervision needed - https://phabricator.wikimedia.org/T168109#3356440 (10alanajjar) [01:03:59] 10Operations, 10DBA, 10Wikimedia-Site-requests: Global rename of Idh0854 → Garam: supervision needed - https://phabricator.wikimedia.org/T167031#3356453 (10alanajjar) T168109 [01:05:53] 10Operations, 10Labs, 10Labs-Infrastructure: Puppet CA: virt1000.wikimedia.org' will expire on 2017-08-15 - https://phabricator.wikimedia.org/T168110#3356455 (10Dzahn) [01:07:52] 10Operations, 10Labs, 10Labs-Infrastructure: Puppet CA: virt1000.wikimedia.org' will expire on 2017-08-15 - https://phabricator.wikimedia.org/T168110#3356472 (10Dzahn) p:05Triage>03Normal [01:29:33] 10Operations, 10Commons, 10Multimedia, 10Traffic, and 2 others: Disable serving unpatrolled new files to Wikipedia Zero users - https://phabricator.wikimedia.org/T167400#3331800 (10Catrope) Please also keep in mind that blocking all unpatrolled files for all anons would have a lot of collateral impact: art... [01:36:31] 10Operations, 10Commons, 10Multimedia, 10Traffic, and 2 others: Disable serving unpatrolled new files to Wikipedia Zero users - https://phabricator.wikimedia.org/T167400#3331800 (10Poyekhali) Unless Commons have a lot of new file patrollers who really mark new files as patrolled (which I see is not the cas... [01:57:54] PROBLEM - very high load average likely xfs on ms-be1010 is CRITICAL: CRITICAL - load average: 100.58, 100.03, 99.46 [02:02:54] PROBLEM - very high load average likely xfs on ms-be1010 is CRITICAL: CRITICAL - load average: 100.35, 100.05, 99.62 [02:06:54] PROBLEM - very high load average likely xfs on ms-be1010 is CRITICAL: CRITICAL - load average: 100.25, 100.12, 99.75 [02:10:41] (03PS1) 10Dzahn: wikistats: disable cron for wikia - dump file too large (for now) [puppet] - 10https://gerrit.wikimedia.org/r/359642 [02:11:37] (ms-be1010 is alive and rsyncing things) [02:12:48] (03PS2) 10Dzahn: wikistats: disable cron for wikia - dump file too large (for now) [puppet] - 10https://gerrit.wikimedia.org/r/359642 (https://phabricator.wikimedia.org/T165879) [02:12:54] PROBLEM - very high load average likely xfs on ms-be1010 is CRITICAL: CRITICAL - load average: 99.88, 100.02, 99.83 [02:14:18] (03CR) 10Dzahn: [C: 032] wikistats: disable cron for wikia - dump file too large (for now) [puppet] - 10https://gerrit.wikimedia.org/r/359642 (https://phabricator.wikimedia.org/T165879) (owner: 10Dzahn) [02:16:54] PROBLEM - very high load average likely xfs on ms-be1010 is CRITICAL: CRITICAL - load average: 100.26, 100.01, 99.87 [02:23:43] !log l10nupdate@tin scap sync-l10n completed (1.30.0-wmf.5) (duration: 07m 09s) [02:23:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [02:29:51] !log l10nupdate@tin ResourceLoader cache refresh completed at Sat Jun 17 02:29:51 UTC 2017 (duration 6m 8s) [02:30:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [04:12:14] PROBLEM - mailman I/O stats on fermium is CRITICAL: CRITICAL - I/O stats: Transfers/Sec=790.10 Read Requests/Sec=331.30 Write Requests/Sec=0.20 KBytes Read/Sec=42322.00 KBytes_Written/Sec=2.00 [04:23:14] RECOVERY - mailman I/O stats on fermium is OK: OK - I/O stats: Transfers/Sec=0.60 Read Requests/Sec=0.20 Write Requests/Sec=0.50 KBytes Read/Sec=1.20 KBytes_Written/Sec=6.80 [05:06:44] 10Operations, 10DBA, 10Wikimedia-Site-requests: Global rename of Smuconlaw → Sgconlaw: supervision needed - https://phabricator.wikimedia.org/T168109#3356544 (10Marostegui) p:05Triage>03Normal Please ping me before doing this (either here or on IRC) so I can monitor the databases while this is being perf... [05:10:06] (03CR) 10Marostegui: "We do not manage grants directly with puppet, so deleting the file doesn't imply deleting grants from anywhere. We would actually be addin" [puppet] - 10https://gerrit.wikimedia.org/r/359152 (https://phabricator.wikimedia.org/T167961) (owner: 10Marostegui) [05:12:38] 10Operations, 10Commons, 10Multimedia, 10Traffic, and 2 others: Disable serving unpatrolled new files to Wikipedia Zero users - https://phabricator.wikimedia.org/T167400#3356547 (10Bawolff) >>! In T167400#3356482, @Poyekhali wrote: > Unless Commons have a lot of new file patrollers who really mark new file... [07:04:51] 10Operations, 10Commons, 10Multimedia, 10Traffic, and 2 others: Disable serving unpatrolled new files to Wikipedia Zero users - https://phabricator.wikimedia.org/T167400#3356578 (10Poyekhali) >>! In T167400#3356547, @Bawolff wrote: >>>! In T167400#3356482, @Poyekhali wrote: >> Unless Commons have a lot of... [09:44:24] PROBLEM - Juniper alarms on mr1-eqiad is CRITICAL: JNX_ALARMS CRITICAL - No response from remote host 208.80.154.199 [09:44:24] PROBLEM - Router interfaces on mr1-eqiad is CRITICAL: CRITICAL: No response from remote host 208.80.154.199 for 1.3.6.1.2.1.2.2.1.8 with snmp version 2 [09:45:14] RECOVERY - Juniper alarms on mr1-eqiad is OK: JNX_ALARMS OK - 0 red alarms, 0 yellow alarms [09:46:14] RECOVERY - Router interfaces on mr1-eqiad is OK: OK: host 208.80.154.199, interfaces up: 38, down: 0, dormant: 0, excluded: 0, unused: 0 [09:47:25] 10Operations, 10Commons, 10Multimedia, 10Traffic, and 2 others: Disable serving unpatrolled new files to Wikipedia Zero users - https://phabricator.wikimedia.org/T167400#3331800 (10Younes19956) Hello ! I have a large size image (12.3MB) of a high quality and i want to upload it here to show you some inform... [10:07:31] 10Operations, 10Commons, 10Multimedia, 10Traffic, and 2 others: Disable serving unpatrolled new files to Wikipedia Zero users - https://phabricator.wikimedia.org/T167400#3356691 (10Framawiki) @Younes19956 Please stop spamming different tasks. You can upload files on external providers such https://imagesha... [10:11:50] 10Operations, 10Commons, 10Multimedia, 10Traffic, and 2 others: Disable serving unpatrolled new files to Wikipedia Zero users - https://phabricator.wikimedia.org/T167400#3356707 (10Younes19956) >>! In T167400#3356691, @Framawiki wrote: > @Younes19956 Please stop spamming different tasks. You can upload fil... [10:27:54] PROBLEM - MegaRAID on db1046 is CRITICAL: CRITICAL: 1 LD(s) must have write cache policy WriteBack, currently using: WriteThrough [10:29:23] ACKNOWLEDGEMENT - MegaRAID on db1046 is CRITICAL: CRITICAL: 1 LD(s) must have write cache policy WriteBack, currently using: WriteThrough Jcrespo https://phabricator.wikimedia.org/T166141 [10:52:50] 10Operations, 10DBA, 10Wikimedia-Site-requests: Global rename of Smuconlaw → Sgconlaw: supervision needed - https://phabricator.wikimedia.org/T168109#3356440 (10Framawiki) Please use `per [[:meta:Special:GlobalRenameQueue/request/33965/view|request]] - [[phab:T168109|task]]` in the edit summary, I declined t... [11:52:37] 10Operations, 10Commons, 10Multimedia, 10Traffic, and 2 others: Disable serving unpatrolled new files to Wikipedia Zero users - https://phabricator.wikimedia.org/T167400#3356739 (10Steinsplitter) >>! In T167400#3356547, @Bawolff wrote: >[...] > To clarify, I would support sending 403s for original file ass... [12:16:45] 10Operations, 10Commons, 10Multimedia, 10Traffic, and 2 others: Disable serving unpatrolled new files to Wikipedia Zero users - https://phabricator.wikimedia.org/T167400#3356758 (10Framawiki) Perhaps a default text/picture like "This file can't be shown for now" can be better than a 403. [12:47:28] not sure if this is ops or tech question [12:47:50] but why is it that when i mark all notifications as read sometimes when i open a new window (definitely new, not an old one that sat) they come back? [13:42:45] Chrissymad: probably something with cache [14:32:14] (03PS1) 10Merlijn van Deen: toollabs: install `lame` on exec hosts [puppet] - 10https://gerrit.wikimedia.org/r/359669 [14:33:42] (03PS2) 10Merlijn van Deen: toollabs: install `lame` on exec hosts [puppet] - 10https://gerrit.wikimedia.org/r/359669 [14:42:10] (03PS1) 10Framawiki: Create a FeaturedFeed for the Wikimag bulletin on frwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/359670 (https://phabricator.wikimedia.org/T168005) [14:56:24] (03PS3) 10Multichill: Adding the domain for the Bayerische Staatsgemäldesammlungen [mediawiki-config] - 10https://gerrit.wikimedia.org/r/355881 (https://phabricator.wikimedia.org/T166437) [15:26:26] !log rebuild pc2004's (depooled) data from scratch [15:26:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:44:56] 10Operations, 10Wikimedia-General-or-Unknown: Json queries fail "Too Many Requests" - https://phabricator.wikimedia.org/T168033#3357355 (10Yurivict) {F8472578} The attached python program runs many requests in many connections concurrently. It saves all responses into the file result.txt After about 30 second... [16:50:34] RECOVERY - pdfrender on scb2004 is OK: HTTP OK: HTTP/1.1 200 OK - 275 bytes in 0.075 second response time [16:50:55] RECOVERY - pdfrender on scb2002 is OK: HTTP OK: HTTP/1.1 200 OK - 275 bytes in 0.073 second response time [16:51:00] !log restarted pdfrender on scb200[2,4] T159922 [16:51:10] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:51:10] T159922: pdfrender fails to serve requests since Mar 8 00:30:32 UTC on scb1003 - https://phabricator.wikimedia.org/T159922 [16:55:02] 10Operations, 10Electron-PDFs, 10Services, 10Patch-For-Review: pdfrender fails to serve requests since Mar 8 00:30:32 UTC on scb1003 - https://phabricator.wikimedia.org/T159922#3083419 (10Volans) There is an ETA for a permanent fix? It seems to me that we've already delayed this too much given the frequenc... [17:02:44] PROBLEM - Check Varnish expiry mailbox lag on cp1074 is CRITICAL: CRITICAL: expiry mailbox lag is 2037770 [17:02:54] PROBLEM - Debian mirror in sync with upstream on sodium is CRITICAL: /srv/mirrors/debian is over 14 hours old. [17:24:09] (03CR) 10Hashar: [C: 031] Adding the domain for the Bayerische Staatsgemäldesammlungen [mediawiki-config] - 10https://gerrit.wikimedia.org/r/355881 (https://phabricator.wikimedia.org/T166437) (owner: 10Multichill) [18:08:39] 10Operations, 10Wikimedia-General-or-Unknown: Json queries fail "Too Many Requests" - https://phabricator.wikimedia.org/T168033#3357547 (10Reedy) Yeah, then almost certainly you're making too many requests in a minute period [18:12:44] RECOVERY - Check Varnish expiry mailbox lag on cp1074 is OK: OK: expiry mailbox lag is 0 [18:18:27] 10Operations, 10Traffic, 10Wikimedia-General-or-Unknown: Impending load test - https://phabricator.wikimedia.org/T167920#3357585 (10Marostegui) @Haiku-narrative is this going to be Thursday 22nd? [18:32:18] volans: if you are doing the banning and disabling accounts, thanks [18:33:25] I am for now ;) [18:34:07] in case you didn't see, the ticket is T168142 [18:34:08] T168142: Cleanup phabricator.wikimedia.org uploaded files, WP zero abuse - https://phabricator.wikimedia.org/T168142 [18:35:16] we really need some sort of abuse protection mechism in phab. last time phab got abused (I forgot how), account creation was disabled on phab [18:37:16] there is a long standing task open upstream, I can link it later [18:37:22] k [18:46:09] 10Operations, 10Wikimedia-General-or-Unknown: Json queries fail "Too Many Requests" - https://phabricator.wikimedia.org/T168033#3357644 (10Yurivict) Failing requests isn't a valid response. If I would run this from some company behind the firewall, nobody else will be able to use wikipedia from that IP address... [18:49:03] 10Operations, 10Wikimedia-General-or-Unknown: Json queries fail "Too Many Requests" - https://phabricator.wikimedia.org/T168033#3357645 (10Reedy) >>! In T168033#3357644, @Yurivict wrote: > Failing requests isn't a valid response. If I would run this from some company behind the firewall, nobody else will be ab... [19:06:13] 10Operations, 10Wikimedia-General-or-Unknown: Json queries fail "Too Many Requests" - https://phabricator.wikimedia.org/T168033#3357657 (10Reedy) https://tools.ietf.org/html/rfc6585 ``` 4. 429 Too Many Requests The 429 status code indicates that the user has sent too many requests in a given amount of... [19:25:45] (03PS3) 10EBernhardson: [WIP] Add ltr-query 0.1.1 snapshot [software/elasticsearch/plugins] - 10https://gerrit.wikimedia.org/r/359359 [19:29:14] PROBLEM - Check systemd state on relforge1001 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [19:30:09] !log restarting elasticsearch on relforge to pick up new vrsion of ltr-query [19:30:19] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:35:25] (03PS3) 10BryanDavis: toollabs: install `lame` on exec hosts [puppet] - 10https://gerrit.wikimedia.org/r/359669 (https://phabricator.wikimedia.org/T168128) (owner: 10Merlijn van Deen) [19:36:15] (03CR) 10BryanDavis: [C: 031] toollabs: install `lame` on exec hosts [puppet] - 10https://gerrit.wikimedia.org/r/359669 (https://phabricator.wikimedia.org/T168128) (owner: 10Merlijn van Deen) [19:37:17] 10Operations, 10Wikimedia-General-or-Unknown: Json queries fail "Too Many Requests" - https://phabricator.wikimedia.org/T168033#3357680 (10Aklapper) 05Open>03Invalid Closing task as invalid. [19:44:35] 10Operations, 10Wikimedia-General-or-Unknown: Json queries fail "Too Many Requests" - https://phabricator.wikimedia.org/T168033#3357737 (10Yurivict) > The spec seems to suggest failing requests is a valid response This is absolutely a valid general response code. But I don't see how is it reasonable to fail r... [19:47:56] volans: fyi, phab apache pytes per sec still seems to be skyrocketing https://grafana.wikimedia.org/dashboard/db/phabricator?orgId=1&from=1496340600395&to=1497723000411 [19:49:36] * https://grafana.wikimedia.org/dashboard/db/phabricator?orgId=1&from=now-7d&to=now [19:50:33] reduced a bit, but considering the files should all have been deleted it doesn't really make sense [19:51:05] um https://phabricator.wikimedia.org/p/Smailons/ though [20:26:26] zhuyifei1999_: that one too was blocked or am I looking at the wrong one? (sorry on mobile) [20:26:49] blocked by reedy and andre I think [20:26:54] a while ago [20:27:21] (as in after I said that) [20:27:33] thanks anyways [20:27:48] no, thank you! ;) [23:48:08] 10Operations, 10Patch-For-Review, 10Technical-Debt: Supersede RT tickets references - https://phabricator.wikimedia.org/T165733#3358033 (10Dereckson) Thanks.