[00:06:29] 10Operations, 10DC-Ops, 10Data-Services: Split up labstore external shelf storage available in codfw between labstore2001 and 2 - https://phabricator.wikimedia.org/T171623#3534707 (10madhuvishy) @Papaul poke on this task since it's been ~3 weeks. Let me know if you need anything from me to proceed, thank you! [02:35:05] 10Operations, 10DC-Ops, 10Data-Services: Split up labstore external shelf storage available in codfw between labstore2001 and 2 - https://phabricator.wikimedia.org/T171623#3534808 (10Papaul) @madhuvishy This task was not assigned to me and it is no way on the ops-codfw work board so I did not know about the... [02:35:58] 10Operations, 10DC-Ops, 10Data-Services: Split up labstore external shelf storage available in codfw between labstore2001 and 2 - https://phabricator.wikimedia.org/T171623#3534809 (10Papaul) a:03Papaul [02:48:13] 10Operations, 10Gerrit, 10ORES, 10Scap, and 2 others: Simplify git-fat support for pulling from both production and labs - https://phabricator.wikimedia.org/T171758#3534823 (10greg) [03:17:55] (03PS2) 10Krinkle: Enable jQuery 3 on mediawiki.org and test.wikidata.org [mediawiki-config] - 10https://gerrit.wikimedia.org/r/372485 (https://phabricator.wikimedia.org/T124742) [03:18:35] (03CR) 10Krinkle: [C: 032] "Gathering some test data over the weekend." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/372485 (https://phabricator.wikimedia.org/T124742) (owner: 10Krinkle) [03:19:57] (03Merged) 10jenkins-bot: Enable jQuery 3 on mediawiki.org and test.wikidata.org [mediawiki-config] - 10https://gerrit.wikimedia.org/r/372485 (https://phabricator.wikimedia.org/T124742) (owner: 10Krinkle) [03:20:08] (03CR) 10jenkins-bot: Enable jQuery 3 on mediawiki.org and test.wikidata.org [mediawiki-config] - 10https://gerrit.wikimedia.org/r/372485 (https://phabricator.wikimedia.org/T124742) (owner: 10Krinkle) [03:21:24] !log krinkle@tin Synchronized wmf-config/InitialiseSettings.php: Enable jQuery 3 on test.wikidata and mediawiki.org - Id4d42d8c53 (duration: 00m 45s) [03:21:37] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [03:29:05] PROBLEM - MariaDB Slave Lag: s1 on dbstore1002 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 779.90 seconds [04:09:49] 10Operations, 10ops-eqiad, 10DBA, 10hardware-requests: Decommission db1015 - https://phabricator.wikimedia.org/T173570#3534868 (10Peachey88) [04:14:26] RECOVERY - MariaDB Slave Lag: s1 on dbstore1002 is OK: OK slave_sql_lag Replication lag: 252.59 seconds [05:49:25] 10Operations, 10Traffic, 10Community-Liaisons (Jul-Sep 2017), 10Patch-For-Review, 10User-Johan: Communicate dropping IE8-on-XP support (a security change) to affected editors and other community members - https://phabricator.wikimedia.org/T163251#3534887 (10EdErhart-WMF) Clearly I don't check my Phab not... [05:52:26] PROBLEM - IPv4 ping to codfw on ripe-atlas-codfw is CRITICAL: CRITICAL - failed 36 probes of 278 (alerts on 19) - https://atlas.ripe.net/measurements/1791210/#!map [05:57:26] RECOVERY - IPv4 ping to codfw on ripe-atlas-codfw is OK: OK - failed 16 probes of 278 (alerts on 19) - https://atlas.ripe.net/measurements/1791210/#!map [06:19:35] PROBLEM - IPv4 ping to codfw on ripe-atlas-codfw is CRITICAL: CRITICAL - failed 30 probes of 278 (alerts on 19) - https://atlas.ripe.net/measurements/1791210/#!map [06:24:35] RECOVERY - IPv4 ping to codfw on ripe-atlas-codfw is OK: OK - failed 4 probes of 278 (alerts on 19) - https://atlas.ripe.net/measurements/1791210/#!map [07:01:45] PROBLEM - IPv4 ping to codfw on ripe-atlas-codfw is CRITICAL: CRITICAL - failed 28 probes of 278 (alerts on 19) - https://atlas.ripe.net/measurements/1791210/#!map [07:06:45] RECOVERY - IPv4 ping to codfw on ripe-atlas-codfw is OK: OK - failed 1 probes of 278 (alerts on 19) - https://atlas.ripe.net/measurements/1791210/#!map [07:08:01] (03PS3) 10ArielGlenn: dump global block table from central auth [puppet] - 10https://gerrit.wikimedia.org/r/372507 (https://phabricator.wikimedia.org/T173468) [07:11:45] (03PS4) 10ArielGlenn: dump global block table from central auth [puppet] - 10https://gerrit.wikimedia.org/r/372507 (https://phabricator.wikimedia.org/T173468) [07:16:12] (03CR) 10ArielGlenn: [C: 032] dump global block table from central auth [puppet] - 10https://gerrit.wikimedia.org/r/372507 (https://phabricator.wikimedia.org/T173468) (owner: 10ArielGlenn) [11:32:45] Hi people can we try at least to requeue the renames listed at T173419, please? [11:32:45] T173419: Unblock stuck global renames at Meta-Wiki - https://phabricator.wikimedia.org/T173419 [11:37:02] 10Operations, 10Wikimedia-Site-requests, 10User-MarcoAurelio, 10Wikimedia-log-errors: Unblock stuck global renames at Meta-Wiki - https://phabricator.wikimedia.org/T173419#3535057 (10MarcoAurelio) p:05High>03Unbreak! Per my statement at T173419#3532167 and given that this is having production impact, a... [11:47:42] 10Operations, 10Wikimedia-Site-requests, 10User-MarcoAurelio, 10Wikimedia-log-errors: Unblock stuck global renames at Meta-Wiki - https://phabricator.wikimedia.org/T173419#3535061 (10MarcoAurelio) ```name=How-To, lang=php # Verify if there are running rename jobs you@terbium:~$ mwscript showJobs.php --wiki... [11:48:19] Reedy: / RainbowSprinkles around? [12:00:36] PROBLEM - puppet last run on snapshot1001 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 2 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/usr/lib/nagios/plugins/check_sysctl] [12:09:35] (03PS2) 10Ladsgroup: Add hieradata for ores::celery::workers with default. [puppet] - 10https://gerrit.wikimedia.org/r/369915 (https://phabricator.wikimedia.org/T169246) (owner: 10Halfak) [12:12:17] (03CR) 10Ladsgroup: "I did what Alex said :)" [puppet] - 10https://gerrit.wikimedia.org/r/369915 (https://phabricator.wikimedia.org/T169246) (owner: 10Halfak) [12:28:06] RECOVERY - puppet last run on snapshot1001 is OK: OK: Puppet is currently enabled, last run 48 seconds ago with 0 failures [12:28:20] Amir1: ping :) [12:39:03] TabbyCat: what's up [12:39:21] Amir1: wondering if you have a minute to unblock one stuck global rename [12:40:03] let me check the maintenance script [12:42:49] TabbyCat: I can't find it, can you point me to it [12:43:01] https://phabricator.wikimedia.org/T173419 Amir1 [12:44:06] okay, which user? [12:44:45] Amir1: mwscript extensions/CentralAuth/maintenance/fixStuckGlobalRename.php --wiki=metawiki --logwiki=metawiki "Bolsée" "Kathmandu2017" [12:46:23] got to go, will follow up on Phabricator later [12:47:15] !log ladsgroup@terbium:~$ mwscript extensions/CentralAuth/maintenance/fixStuckGlobalRename.php --wiki=metawiki --logwiki=metawiki "Bolsée" "Kathmandu2017" (T173419) [12:47:29] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [12:47:29] T173419: Unblock stuck global renames at Meta-Wiki - https://phabricator.wikimedia.org/T173419 [12:48:04] The first one is gone now, I'm not sure about the others [12:51:43] 10Operations, 10Wikimedia-Site-requests, 10User-MarcoAurelio, 10Wikimedia-log-errors: Unblock stuck global renames at Meta-Wiki - https://phabricator.wikimedia.org/T173419#3527677 (10Ladsgroup) The first one is gone now, but others are not being picked up it seems. [12:54:30] 10Operations, 10Wikimedia-Site-requests, 10Regression, 10User-MarcoAurelio, 10Wikimedia-log-errors: Unblock stuck global renames at Meta-Wiki - https://phabricator.wikimedia.org/T173419#3535107 (10Base) [13:00:46] !log running the script for ("Clopper228" "CGminded") and ("Gregory.lussier" "StevenSmith83473") (T173419) [13:00:58] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:00:59] T173419: Unblock stuck global renames at Meta-Wiki - https://phabricator.wikimedia.org/T173419 [13:01:21] I just renamed one user using my right (as global renamer) and it worked just fine without getting stuck on meta [13:09:14] (03CR) 10Lydia Pintscher: "*ping*" [puppet] - 10https://gerrit.wikimedia.org/r/360887 (https://phabricator.wikimedia.org/T163922) (owner: 10Ladsgroup) [13:17:13] !log another run: (Hotwc3 → HotWC3) (Lamia Bahy → Albedo11) (Monóxido de carbono → Roquetero) (PaulMichaels → PaulBenario) (Rodrigo.dst → RodrigoTavares) (Sadia Tasnim (Moyna) → মুহাম্মদ সুমন মাহমুদ) (Syou 18331322 → Ms3102) (TzvetelinaOOD1 → Tzveti1) (World Para Taekwondo → TKD at World Para Taekwondo) (Yaellerner → Ya1levy777) (平井 俊光 → Toshimit) (T173419) [13:17:24] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:17:24] T173419: Unblock stuck global renames at Meta-Wiki - https://phabricator.wikimedia.org/T173419 [13:19:08] hey Amir1 saw that the queue is empty now and renames are being handled again? [13:19:19] Hey, I just sent an email [13:19:19] let's see what happens when the new ones arrive to metawiki [13:19:27] * TabbyCat checks [13:19:31] I did lots of tests on that [13:19:32] all work [13:19:50] I think I just renamed more than ten users [13:21:02] https://meta.wikimedia.org/wiki/Special:GlobalRenameProgress/TheLadyStranger77 looks stuck at meta, again [13:21:13] stratch that [13:21:17] it worked [13:21:23] (forgot to reload) [13:36:36] 10Operations, 10Wikimedia-Site-requests, 10Regression, 10User-MarcoAurelio, 10Wikimedia-log-errors: Unblock stuck global renames at Meta-Wiki - https://phabricator.wikimedia.org/T173419#3535152 (10MarcoAurelio) p:05Unbreak!>03Normal Reduce to normal as the queue it seems the jobs are being run this t... [13:38:45] PROBLEM - HHVM jobrunner on mw1164 is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 Service Unavailable - 473 bytes in 0.001 second response time [13:40:46] RECOVERY - HHVM jobrunner on mw1164 is OK: HTTP OK: HTTP/1.1 200 OK - 202 bytes in 0.002 second response time [13:49:18] Hello, any free sysadmin who can help in rename request that have more than 50000 edits? [13:53:51] Alaa: it needs to be done by manuel and radix [14:02:31] he quit before you replied ;-; [14:20:51] revi: it seems I have conpherence mode set up here [14:21:15] oh yeah, phab messaging stuff [14:23:22] no, not that one [14:23:42] cz conpherence mode is a mode that lets you ignore join/part messages on crowded channels [14:50:24] 10Operations, 10Traffic, 10Patch-For-Review, 10User-notice: Removing support for DES-CBC3-SHA TLS cipher (drops IE8-on-XP support) - https://phabricator.wikimedia.org/T147199#2684468 (10GeoffreyT2000) What about IE7 on Windows XP and Windows Vista? If this also no longer has access to Wikipedia, then we sh... [15:06:15] PROBLEM - puppet last run on graphite1003 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:14:39] (03PS1) 10Alex Monk: Wikibase on deployment-prep: Exclude non-existent wikis from clientDbList [mediawiki-config] - 10https://gerrit.wikimedia.org/r/372761 (https://phabricator.wikimedia.org/T173571) [15:23:55] RECOVERY - salt-minion processes on cp1008 is OK: PROCS OK: 1 process with regex args ^/usr/bin/python /usr/bin/salt-minion [15:26:55] PROBLEM - salt-minion processes on cp1008 is CRITICAL: PROCS CRITICAL: 0 processes with regex args ^/usr/bin/python /usr/bin/salt-minion [15:33:36] RECOVERY - puppet last run on graphite1003 is OK: OK: Puppet is currently enabled, last run 28 seconds ago with 0 failures [15:38:20] 10Puppet, 10Beta-Cluster-Infrastructure, 10Tracking: Deployment-prep hosts with puppet errors (tracking) - https://phabricator.wikimedia.org/T132259#3535193 (10Krenair) [15:38:23] 10Puppet, 10Beta-Cluster-Infrastructure: Puppet broken on deployment-pdf01 - https://phabricator.wikimedia.org/T173552#3535190 (10Krenair) 05Open>03Resolved a:03Krenair Fixed it by adding some stuff to puppet based on the role used in prod: ```profile::redis::master::instances: - 6379 profile::redis::mas... [15:48:20] (03CR) 10Halfak: "We still have '$celery_workers = 45' in modules/ores/manifests/web.pp Will that be a problem?" [puppet] - 10https://gerrit.wikimedia.org/r/369915 (https://phabricator.wikimedia.org/T169246) (owner: 10Halfak) [16:13:15] 10Operations, 10Thumbor, 10Patch-For-Review, 10Performance-Team (Radar), 10User-fgiunchedi: Upgrade Thumbor servers to Stretch - https://phabricator.wikimedia.org/T170817#3535235 (10Krenair) [16:54:10] 10Operations, 10Traffic, 10Patch-For-Review, 10User-notice: Removing support for DES-CBC3-SHA TLS cipher (drops IE8-on-XP support) - https://phabricator.wikimedia.org/T147199#3535264 (10Pigsonthewing) >>! In T147199#3508328, @MaxSem wrote: > If a corporation is insane enough to still run XP and force their... [16:56:51] 10Operations, 10Traffic, 10Patch-For-Review, 10User-notice: Removing support for DES-CBC3-SHA TLS cipher (drops IE8-on-XP support) - https://phabricator.wikimedia.org/T147199#3535265 (10BBlack) Both IE7 and IE8 for XP are what's being cut off in this transition, with IE8 being the newest IE that's even ava... [17:09:02] 10Operations, 10Traffic, 10Patch-For-Review, 10User-notice: Removing support for DES-CBC3-SHA TLS cipher (drops IE8-on-XP support) - https://phabricator.wikimedia.org/T147199#3535279 (10BBlack) >>! In T147199#3535264, @Pigsonthewing wrote: >>>! In T147199#3508328, @MaxSem wrote: >> If a corporation is insa... [17:16:29] (03PS1) 10Alex Monk: Fix mwrepl to require expanddblist dependency, from scap::scripts [puppet] - 10https://gerrit.wikimedia.org/r/372764 [17:16:52] (03CR) 10jerkins-bot: [V: 04-1] Fix mwrepl to require expanddblist dependency, from scap::scripts [puppet] - 10https://gerrit.wikimedia.org/r/372764 (owner: 10Alex Monk) [17:19:49] (03PS2) 10Alex Monk: Fix mwrepl to require expanddblist dependency, from scap::scripts [puppet] - 10https://gerrit.wikimedia.org/r/372764 [17:39:16] PROBLEM - Check Varnish expiry mailbox lag on cp1049 is CRITICAL: CRITICAL: expiry mailbox lag is 2006782 [18:03:34] (03CR) 10Ebe123: "> The underlying patches providing the wrappers have now been merged" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/370358 (https://phabricator.wikimedia.org/T172582) (owner: 10Ebe123) [18:04:34] (03CR) 10Jayprakash12345: [C: 031] Added Cookbook and Cookbook talk NS on hi.wikibooks [mediawiki-config] - 10https://gerrit.wikimedia.org/r/372387 (https://phabricator.wikimedia.org/T173398) (owner: 10MarcoAurelio) [18:20:07] (03PS1) 10MarcoAurelio: Increase AbuseFilter autodisable thresholds for Meta-Wiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/372768 (https://phabricator.wikimedia.org/T173633) [20:37:39] 10Operations, 10DBA, 10User-MarcoAurelio: Evaluate how hard would be to get aa(wikibooks|wiktionary) and howiki databases deleted - https://phabricator.wikimedia.org/T169928#3535429 (10Jayprakash12345) [20:38:38] ohnoes, where would all our tests go :) [21:23:45] CI is probably going to warn about having a giant queue in a few minutes [21:32:23] Seems like there's something wrong with zuul? I really think the jobs shouldn't take that long… [21:34:50] Oh, I think I see what's happened [21:35:07] Everything seems to be okay, it's just a bit much at a time [21:35:15] PROBLEM - Work requests waiting in Zuul Gearman server https://grafana.wikimedia.org/dashboard/db/zuul-gearman on contint1001 is CRITICAL: CRITICAL: 35.71% of data above the critical threshold [140.0] [21:38:15] PROBLEM - Work requests waiting in Zuul Gearman server https://grafana.wikimedia.org/dashboard/db/zuul-gearman on contint1001 is CRITICAL: CRITICAL: 50.00% of data above the critical threshold [140.0] [22:22:45] PROBLEM - Work requests waiting in Zuul Gearman server https://grafana.wikimedia.org/dashboard/db/zuul-gearman on contint1001 is CRITICAL: CRITICAL: 35.71% of data above the critical threshold [140.0] [22:30:55] PROBLEM - Work requests waiting in Zuul Gearman server https://grafana.wikimedia.org/dashboard/db/zuul-gearman on contint1001 is CRITICAL: CRITICAL: 33.33% of data above the critical threshold [140.0] [22:49:35] RECOVERY - Check Varnish expiry mailbox lag on cp1049 is OK: OK: expiry mailbox lag is 0 [22:49:56] RECOVERY - Work requests waiting in Zuul Gearman server https://grafana.wikimedia.org/dashboard/db/zuul-gearman on contint1001 is OK: OK: Less than 30.00% above the threshold [90.0] [22:50:22] (03CR) 10Platonides: [C: 031] Increase AbuseFilter autodisable thresholds for Meta-Wiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/372768 (https://phabricator.wikimedia.org/T173633) (owner: 10MarcoAurelio)