[00:04:40] PROBLEM - Check systemd state on netflow3001 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [00:05:18] PROBLEM - Check systemd state on netflow5001 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [00:05:22] PROBLEM - Check systemd state on netflow4001 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [00:05:32] PROBLEM - Check systemd state on netflow2001 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [00:05:50] PROBLEM - Check systemd state on netflow1001 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [00:18:36] RECOVERY - HTTPS-dbtree on dbmonitor1001 is OK: HTTP OK: HTTP/1.1 200 OK - 91680 bytes in 6.273 second response time https://wikitech.wikimedia.org/wiki/Dbtree.wikimedia.org [03:46:54] PROBLEM - SSH on webperf2002 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/SSH/monitoring [03:48:42] RECOVERY - SSH on webperf2002 is OK: SSH OK - OpenSSH_7.4p1 Debian-10+deb9u7 (protocol 2.0) https://wikitech.wikimedia.org/wiki/SSH/monitoring [04:20:04] PROBLEM - Check systemd state on webperf2002 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [04:28:02] RECOVERY - Check systemd state on webperf2002 is OK: OK - running: The system is fully operational https://wikitech.wikimedia.org/wiki/Monitoring/check_systemd_state [06:28:32] PROBLEM - Cxserver LVS eqiad on cxserver.svc.eqiad.wmnet is CRITICAL: /v2/suggest/sections/{title}/{from}/{to} (Suggest source sections to translate) timed out before a response was received https://wikitech.wikimedia.org/wiki/CX [06:30:20] RECOVERY - Cxserver LVS eqiad on cxserver.svc.eqiad.wmnet is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/CX [07:00:04] Deploy window No deploys all day! See Deployments/Emergencies if things are broken. (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20200816T0700) [09:29:40] RECOVERY - WDQS high update lag on wdqs1004 is OK: (C)4.32e+04 ge (W)2.16e+04 ge 2.135e+04 https://wikitech.wikimedia.org/wiki/Wikidata_query_service/Runbook%23Update_lag https://grafana.wikimedia.org/dashboard/db/wikidata-query-service?orgId=1&panelId=8&fullscreen [11:12:33] !log repooling wdqs1004 - catched up on lag [11:12:36] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:12:36] ryankemper: cc ^ [11:16:36] PROBLEM - Stale file for node-exporter textfile in eqiad on icinga1001 is CRITICAL: cluster=api_appserver file={nic_firmware.prom,phpfpm-statustext.prom} instance=mw1276 job=node site=eqiad https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/knkl4dCWz/node-exporter-textfile [11:25:32] PROBLEM - Number of messages locally queued by purged for processing on cp3050 is CRITICAL: cluster=cache_text instance=cp3050 job=purged layer=backend site=esams https://wikitech.wikimedia.org/wiki/Purged%23Alerts https://grafana.wikimedia.org/dashboard/db/purged?var-datasource=esams+prometheus/ops&var-instance=cp3050 [11:31:26] RECOVERY - Number of messages locally queued by purged for processing on cp3050 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Purged%23Alerts https://grafana.wikimedia.org/dashboard/db/purged?var-datasource=esams+prometheus/ops&var-instance=cp3050 [11:46:03] 10Operations, 10Traffic, 10netops: Users of Jio ISP (India, AS 55836) unable to reach Wikimedia sites - https://phabricator.wikimedia.org/T260449 (10Josve05a) For reference https://ticket.wikimedia.org/otrs/index.pl?Action=AgentTicketZoom;TicketID=11476976 [13:01:46] PROBLEM - Number of messages locally queued by purged for processing on cp3056 is CRITICAL: cluster=cache_text instance=cp3056 job=purged layer=backend site=esams https://wikitech.wikimedia.org/wiki/Purged%23Alerts https://grafana.wikimedia.org/dashboard/db/purged?var-datasource=esams+prometheus/ops&var-instance=cp3056 [13:02:02] PROBLEM - Number of messages locally queued by purged for processing on cp1079 is CRITICAL: cluster=cache_text instance=cp1079 job=purged layer=backend site=eqiad https://wikitech.wikimedia.org/wiki/Purged%23Alerts https://grafana.wikimedia.org/dashboard/db/purged?var-datasource=eqiad+prometheus/ops&var-instance=cp1079 [13:02:52] PROBLEM - Number of messages locally queued by purged for processing on cp3054 is CRITICAL: cluster=cache_text instance=cp3054 job=purged layer=backend site=esams https://wikitech.wikimedia.org/wiki/Purged%23Alerts https://grafana.wikimedia.org/dashboard/db/purged?var-datasource=esams+prometheus/ops&var-instance=cp3054 [13:03:54] PROBLEM - Number of messages locally queued by purged for processing on cp3050 is CRITICAL: cluster=cache_text instance=cp3050 job=purged layer=backend site=esams https://wikitech.wikimedia.org/wiki/Purged%23Alerts https://grafana.wikimedia.org/dashboard/db/purged?var-datasource=esams+prometheus/ops&var-instance=cp3050 [13:04:04] PROBLEM - Number of messages locally queued by purged for processing on cp3052 is CRITICAL: cluster=cache_text instance=cp3052 job=purged layer=backend site=esams https://wikitech.wikimedia.org/wiki/Purged%23Alerts https://grafana.wikimedia.org/dashboard/db/purged?var-datasource=esams+prometheus/ops&var-instance=cp3052 [13:04:33] 10Operations, 10Traffic, 10netops: Users of Jio ISP (India, AS 55836) unable to reach Wikimedia sites - https://phabricator.wikimedia.org/T260449 (10CDanis) >>! In T260449#6387317, @Josve05a wrote: > For reference https://ticket.wikimedia.org/otrs/index.pl?Action=AgentTicketZoom;TicketID=11476976 I don't ha... [13:09:40] RECOVERY - Number of messages locally queued by purged for processing on cp3056 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Purged%23Alerts https://grafana.wikimedia.org/dashboard/db/purged?var-datasource=esams+prometheus/ops&var-instance=cp3056 [13:11:58] PROBLEM - Number of messages locally queued by purged for processing on cp3052 is CRITICAL: cluster=cache_text instance=cp3052 job=purged layer=backend site=esams https://wikitech.wikimedia.org/wiki/Purged%23Alerts https://grafana.wikimedia.org/dashboard/db/purged?var-datasource=esams+prometheus/ops&var-instance=cp3052 [13:12:44] PROBLEM - Number of messages locally queued by purged for processing on cp3054 is CRITICAL: cluster=cache_text instance=cp3054 job=purged layer=backend site=esams https://wikitech.wikimedia.org/wiki/Purged%23Alerts https://grafana.wikimedia.org/dashboard/db/purged?var-datasource=esams+prometheus/ops&var-instance=cp3054 [13:15:56] PROBLEM - Number of messages locally queued by purged for processing on cp3052 is CRITICAL: cluster=cache_text instance=cp3052 job=purged layer=backend site=esams https://wikitech.wikimedia.org/wiki/Purged%23Alerts https://grafana.wikimedia.org/dashboard/db/purged?var-datasource=esams+prometheus/ops&var-instance=cp3052 [13:17:54] RECOVERY - Number of messages locally queued by purged for processing on cp3052 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Purged%23Alerts https://grafana.wikimedia.org/dashboard/db/purged?var-datasource=esams+prometheus/ops&var-instance=cp3052 [13:22:36] RECOVERY - Number of messages locally queued by purged for processing on cp3054 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Purged%23Alerts https://grafana.wikimedia.org/dashboard/db/purged?var-datasource=esams+prometheus/ops&var-instance=cp3054 [13:25:38] RECOVERY - Number of messages locally queued by purged for processing on cp3050 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Purged%23Alerts https://grafana.wikimedia.org/dashboard/db/purged?var-datasource=esams+prometheus/ops&var-instance=cp3050 [13:29:38] RECOVERY - Number of messages locally queued by purged for processing on cp1079 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Purged%23Alerts https://grafana.wikimedia.org/dashboard/db/purged?var-datasource=eqiad+prometheus/ops&var-instance=cp1079 [14:35:36] (03PS1) 10Evrifaessa: Add Turkish powered by MW and Wikimedia project icons [mediawiki-config] - 10https://gerrit.wikimedia.org/r/620507 (https://phabricator.wikimedia.org/T260492) [14:43:14] (03CR) 10Urbanecm: "recheck" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/620507 (https://phabricator.wikimedia.org/T260492) (owner: 10Evrifaessa) [14:48:56] (03CR) 10Urbanecm: [C: 03+1] "LGTM" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/620507 (https://phabricator.wikimedia.org/T260492) (owner: 10Evrifaessa) [15:05:58] hello everyone [15:06:04] I just made this commit: https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/620507 [15:06:19] but I don't know what to do next, how do I get it reviewed and merged? [15:06:45] in right, it says "Merge conflicts [15:06:45] ", am I supposed to fix those conflicts? or will they automatically get fixed by the system? [15:09:46] Evrifaessa: you should see a button allowing you to rebase but I wouldn't worry, whoever deploys when you choose can also do it [15:12:36] So, How do I queue my commit for deployment? I'm a first-time contributor in Gerrit, so please forgive my mistakes lol [15:16:06] Evrifaessa: you can read the instructions and schedule it on https://wikitech.wikimedia.org/wiki/Deployments :) [15:16:52] (the instructions are basically be available in this channel at the time on which you scheduled the patch) [15:17:55] ah, also, you should schedule it on the backport windows [15:21:10] dont|panic: uh, I'm way too new in this. these all seem a bit complicated for me, can you please deploy the changes instead of me, as you're way more experienced in this? [15:30:54] Evrifaessa: it's really simple, just pick a "backport window" that you can attend and add your patch to that window on the table. then you need to appear here on that window and the deployer will guide you [15:31:10] Majavah: https://wikitech.wikimedia.org/w/index.php?title=Deployments&type=revision&diff=1877875&oldid=1877838 [15:31:18] did I do it correctly? [15:31:38] Evrifaessa: that's the "Wikidata Query Service weekly deploy" deploy window, not a backport window [15:34:57] oh [15:34:58] https://wikitech.wikimedia.org/w/index.php?title=Deployments&type=revision&diff=1877877&oldid=1877876 [15:35:05] Majavah: correct now? [15:38:35] Evrifaessa: yes, now you just need to be on this channel on the window and the deployers will guide you thru the process [15:38:47] yay, thanks for your help :) [16:03:48] (03PS1) 10Evrifaessa: Add Turkish powered by MW and Wikimedia project icons for Turkish Wikiquote, Turkish Wiktionary, Turkish Wikisource and Turkish Wikibooks [mediawiki-config] - 10https://gerrit.wikimedia.org/r/620509 (https://phabricator.wikimedia.org/T260493) [16:31:26] (03CR) 10Evrifaessa: [C: 03+1] Add Turkish powered by MW and Wikimedia project icons for Turkish Wikiquote, Turkish Wiktionary, Turkish Wikisource and Turkish Wikibooks [mediawiki-config] - 10https://gerrit.wikimedia.org/r/620509 (https://phabricator.wikimedia.org/T260493) (owner: 10Evrifaessa) [17:04:34] (03PS1) 10Evrifaessa: Change the lzh Wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/620510 (https://phabricator.wikimedia.org/T259006) [17:07:27] (03CR) 10Evrifaessa: [C: 03+1] "Could you please review this?" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/620510 (https://phabricator.wikimedia.org/T259006) (owner: 10Evrifaessa) [17:09:20] (03PS2) 10Evrifaessa: Change the logo of lzh Wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/620510 (https://phabricator.wikimedia.org/T259006) [17:18:54] PROBLEM - High average GET latency for mw requests on appserver in eqiad on icinga1001 is CRITICAL: cluster=appserver code=200 handler=proxy:unix:/run/php/fpm-www.sock https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=GET [17:30:48] RECOVERY - High average GET latency for mw requests on appserver in eqiad on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Monitoring/Missing_notes_link https://grafana.wikimedia.org/d/RIA1lzDZk/application-servers-red-dashboard?panelId=9&fullscreen&orgId=1&from=now-3h&to=now&var-datasource=eqiad+prometheus/ops&var-cluster=appserver&var-method=GET [17:36:30] (03PS1) 10Evrifaessa: Add Wiktionary wordmark for eswiktionary [mediawiki-config] - 10https://gerrit.wikimedia.org/r/620513 (https://phabricator.wikimedia.org/T254059) [17:37:22] (03CR) 10Evrifaessa: [C: 03+1] Add Wiktionary wordmark for eswiktionary [mediawiki-config] - 10https://gerrit.wikimedia.org/r/620513 (https://phabricator.wikimedia.org/T254059) (owner: 10Evrifaessa) [17:38:07] Hello again, how do I get jenkins-bot to review my latest commits? [17:41:07] Evrifaessa: the bot automatically checks changes submitted by users in the bot's review list - to get added, please file a Task; alternatively any user in the jenkins-bot list can comment 'recheck' on your patches to manually trigger those checks [17:41:13] Evrifaessa: either get added to the whitelist or like someone who can [17:41:47] (03CR) 10RhinosF1: [C: 03+1] "recheck" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/620513 (https://phabricator.wikimedia.org/T254059) (owner: 10Evrifaessa) [17:42:07] If you don't commit frequently, asking someone to run them manually is better I guess [17:42:10] RhinosF1: s/whitelist/allowlist/g [17:42:21] Majavah: oh ye [17:42:33] well, it's still a whitelist [17:43:17] eh, I'm a new contributor to Gerrit, so can someone please prompt the bot to review my latest commits? I only made 4 commits today. [17:43:40] (03CR) 10RhinosF1: [C: 03+1] "recheck" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/620510 (https://phabricator.wikimedia.org/T259006) (owner: 10Evrifaessa) [17:44:10] (03CR) 10RhinosF1: [C: 03+1] "recheck" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/620509 (https://phabricator.wikimedia.org/T260493) (owner: 10Evrifaessa) [17:44:27] Evrifaessa: I think RhinosF1 is running them for you :-) [17:44:39] Evrifaessa: I think I did them all, let me know if I missed one [17:44:55] RhinosF1: you did all, thanks :) [17:45:02] Perfect [17:45:54] hmm. it seems like there is a failure in here : https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/620510/1 [17:46:26] Evrifaessa: that's expected if I'm reading it right [17:46:31] It's non-voting [17:47:08] diffConfig failures are expected and normal when changint something else than a config variable [17:47:31] oh, so that shouldn't be a matter ig [17:47:34] thanks :) [21:31:04] PROBLEM - Cxserver LVS codfw on cxserver.svc.codfw.wmnet is CRITICAL: /v2/suggest/source/{title}/{to} (Suggest a source title to use for translation) timed out before a response was received https://wikitech.wikimedia.org/wiki/CX [21:35:00] RECOVERY - Cxserver LVS codfw on cxserver.svc.codfw.wmnet is OK: All endpoints are healthy https://wikitech.wikimedia.org/wiki/CX [23:14:32] PROBLEM - Varnish traffic drop between 30min ago and now at esams on icinga1001 is CRITICAL: 39.94 le 60 https://wikitech.wikimedia.org/wiki/Varnish%23Diagnosing_Varnish_alerts https://grafana.wikimedia.org/dashboard/db/varnish-http-requests?panelId=6&fullscreen&orgId=1 [23:17:24] PROBLEM - Varnish traffic drop between 30min ago and now at eqsin on icinga1001 is CRITICAL: 22.4 le 60 https://wikitech.wikimedia.org/wiki/Varnish%23Diagnosing_Varnish_alerts https://grafana.wikimedia.org/dashboard/db/varnish-http-requests?panelId=6&fullscreen&orgId=1 [23:34:44] RECOVERY - Varnish traffic drop between 30min ago and now at eqsin on icinga1001 is OK: All metrics within thresholds. https://wikitech.wikimedia.org/wiki/Varnish%23Diagnosing_Varnish_alerts https://grafana.wikimedia.org/dashboard/db/varnish-http-requests?panelId=6&fullscreen&orgId=1 [23:36:08] RECOVERY - Varnish traffic drop between 30min ago and now at esams on icinga1001 is OK: (C)60 le (W)70 le 80.29 https://wikitech.wikimedia.org/wiki/Varnish%23Diagnosing_Varnish_alerts https://grafana.wikimedia.org/dashboard/db/varnish-http-requests?panelId=6&fullscreen&orgId=1