[03:07:54] 10Operations, 10LDAP-Access-Requests, 10Patch-For-Review: LDAP access to the wmf group for Pita - https://phabricator.wikimedia.org/T247722 (10Jpita) where should I try to login to check if I have access? [03:50:24] (03PS1) 10Ladsgroup: admin: Change my SSH key [puppet] - 10https://gerrit.wikimedia.org/r/584187 [04:14:13] (03PS1) 10Andrew Bogott: neutron: enable l3_agent_only_dmz_cidr_hack in codfw1dev [puppet] - 10https://gerrit.wikimedia.org/r/584188 (https://phabricator.wikimedia.org/T247505) [05:17:18] PROBLEM - Host elastic1059 is DOWN: PING CRITICAL - Packet loss = 100% [05:21:54] PROBLEM - Number of backend failures per minute from CirrusSearch on graphite1004 is CRITICAL: CRITICAL: 20.00% of data above the critical threshold [600.0] https://wikitech.wikimedia.org/wiki/Search%23Health/Activity_Monitoring https://grafana.wikimedia.org/dashboard/db/elasticsearch-percentiles?orgId=1&var-cluster=eqiad&var-smoothing=1&panelId=9&fullscreen [05:26:14] RECOVERY - Number of backend failures per minute from CirrusSearch on graphite1004 is OK: OK: Less than 20.00% above the threshold [300.0] https://wikitech.wikimedia.org/wiki/Search%23Health/Activity_Monitoring https://grafana.wikimedia.org/dashboard/db/elasticsearch-percentiles?orgId=1&var-cluster=eqiad&var-smoothing=1&panelId=9&fullscreen [06:34:04] PROBLEM - Rate of JVM GC Old generation-s runs - cloudelastic1004-cloudelastic-chi-eqiad on cloudelastic1004 is CRITICAL: 111.9 gt 100 https://wikitech.wikimedia.org/wiki/Search%23Using_jstack_or_jmap_or_other_similar_tools_to_view_logs https://grafana.wikimedia.org/d/000000462/elasticsearch-memory?orgId=1&var-exported_cluster=cloudelastic-chi-eqiad&var-instance=cloudelastic1004&panelId=37 [06:45:34] PROBLEM - Rate of JVM GC Old generation-s runs - cloudelastic1003-cloudelastic-chi-eqiad on cloudelastic1003 is CRITICAL: 106.8 gt 100 https://wikitech.wikimedia.org/wiki/Search%23Using_jstack_or_jmap_or_other_similar_tools_to_view_logs https://grafana.wikimedia.org/d/000000462/elasticsearch-memory?orgId=1&var-exported_cluster=cloudelastic-chi-eqiad&var-instance=cloudelastic1003&panelId=37 [06:49:42] PROBLEM - Rate of JVM GC Old generation-s runs - cloudelastic1002-cloudelastic-chi-eqiad on cloudelastic1002 is CRITICAL: 106.8 gt 100 https://wikitech.wikimedia.org/wiki/Search%23Using_jstack_or_jmap_or_other_similar_tools_to_view_logs https://grafana.wikimedia.org/d/000000462/elasticsearch-memory?orgId=1&var-exported_cluster=cloudelastic-chi-eqiad&var-instance=cloudelastic1002&panelId=37 [07:18:13] (03PS10) 10L0st3xpl0r3r: transfer.py: Convert return for run() from int to list [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/583960 (https://phabricator.wikimedia.org/T248661) [07:18:40] (03PS11) 10L0st3xpl0r3r: transfer.py: Convert return for run() from int to list [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/583960 (https://phabricator.wikimedia.org/T248661) [07:20:12] (03CR) 10L0st3xpl0r3r: "> Patch Set 9:" [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/583960 (https://phabricator.wikimedia.org/T248661) (owner: 10L0st3xpl0r3r) [07:25:32] RECOVERY - Rate of JVM GC Old generation-s runs - cloudelastic1002-cloudelastic-chi-eqiad on cloudelastic1002 is OK: (C)100 gt (W)80 gt 76.27 https://wikitech.wikimedia.org/wiki/Search%23Using_jstack_or_jmap_or_other_similar_tools_to_view_logs https://grafana.wikimedia.org/d/000000462/elasticsearch-memory?orgId=1&var-exported_cluster=cloudelastic-chi-eqiad&var-instance=cloudelastic1002&panelId=37 [08:02:56] RECOVERY - HTTPS-planet on en.planet.wikimedia.org is OK: SSL OK - Certificate *.wikipedia.org valid until 2020-06-20 07:01:41 +0000 (expires in 82 days) https://wikitech.wikimedia.org/wiki/Planet.wikimedia.org [08:19:06] (03PS1) 10Elukey: profile::analytics::refinery::job::refine: exclude TwoColConflictExit [puppet] - 10https://gerrit.wikimedia.org/r/584189 [08:20:28] PROBLEM - Rate of JVM GC Old generation-s runs - cloudelastic1003-cloudelastic-chi-eqiad on cloudelastic1003 is CRITICAL: 104.7 gt 100 https://wikitech.wikimedia.org/wiki/Search%23Using_jstack_or_jmap_or_other_similar_tools_to_view_logs https://grafana.wikimedia.org/d/000000462/elasticsearch-memory?orgId=1&var-exported_cluster=cloudelastic-chi-eqiad&var-instance=cloudelastic1003&panelId=37 [08:24:04] !log powercycle elastic1059 - mgmt/serial console stuck, no ssh - racadm getsel shows a lot of OEM errors occurred, nothing specific [08:24:07] cc: gehel: --^ [08:24:08] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:26:50] RECOVERY - Host elastic1059 is UP: PING WARNING - Packet loss = 77%, RTA = 0.17 ms [08:37:21] (03CR) 10Elukey: [C: 03+2] profile::analytics::refinery::job::refine: exclude TwoColConflictExit [puppet] - 10https://gerrit.wikimedia.org/r/584189 (owner: 10Elukey) [08:41:26] PROBLEM - Rate of JVM GC Old generation-s runs - cloudelastic1003-cloudelastic-chi-eqiad on cloudelastic1003 is CRITICAL: 100.7 gt 100 https://wikitech.wikimedia.org/wiki/Search%23Using_jstack_or_jmap_or_other_similar_tools_to_view_logs https://grafana.wikimedia.org/d/000000462/elasticsearch-memory?orgId=1&var-exported_cluster=cloudelastic-chi-eqiad&var-instance=cloudelastic1003&panelId=37 [08:45:40] PROBLEM - Rate of JVM GC Old generation-s runs - cloudelastic1003-cloudelastic-chi-eqiad on cloudelastic1003 is CRITICAL: 102.7 gt 100 https://wikitech.wikimedia.org/wiki/Search%23Using_jstack_or_jmap_or_other_similar_tools_to_view_logs https://grafana.wikimedia.org/d/000000462/elasticsearch-memory?orgId=1&var-exported_cluster=cloudelastic-chi-eqiad&var-instance=cloudelastic1003&panelId=37 [08:55:48] these alerts are kind of expected, the cluster is not big enough and it is currently handling indexations --^ [09:34:39] elukey: thanks for the ping ! [10:45:28] RECOVERY - Rate of JVM GC Old generation-s runs - cloudelastic1003-cloudelastic-chi-eqiad on cloudelastic1003 is OK: (C)100 gt (W)80 gt 75.25 https://wikitech.wikimedia.org/wiki/Search%23Using_jstack_or_jmap_or_other_similar_tools_to_view_logs https://grafana.wikimedia.org/d/000000462/elasticsearch-memory?orgId=1&var-exported_cluster=cloudelastic-chi-eqiad&var-instance=cloudelastic1003&panelId=37 [11:18:00] PROBLEM - Ensure traffic_exporter binds on port 9322 and responds to HTTP requests on cp3054 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server [11:20:26] RECOVERY - Rate of JVM GC Old generation-s runs - cloudelastic1004-cloudelastic-chi-eqiad on cloudelastic1004 is OK: (C)100 gt (W)80 gt 72.2 https://wikitech.wikimedia.org/wiki/Search%23Using_jstack_or_jmap_or_other_similar_tools_to_view_logs https://grafana.wikimedia.org/d/000000462/elasticsearch-memory?orgId=1&var-exported_cluster=cloudelastic-chi-eqiad&var-instance=cloudelastic1004&panelId=37 [11:22:10] RECOVERY - Ensure traffic_exporter binds on port 9322 and responds to HTTP requests on cp3054 is OK: HTTP OK: HTTP/1.0 200 OK - 22361 bytes in 1.011 second response time https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server [11:28:38] PROBLEM - Ensure traffic_exporter binds on port 9322 and responds to HTTP requests on cp3054 is CRITICAL: CRITICAL - Socket timeout after 10 seconds https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server [11:35:42] 10Operations, 10Traffic, 10Wikimedia-General-or-Unknown, 10User-DannyS712: Pages whose title ends with semicolon (;) are intermittently inaccessible - https://phabricator.wikimedia.org/T238285 (10Bugreporter) [11:36:08] RECOVERY - Debian mirror in sync with upstream on sodium is OK: /srv/mirrors/debian is over 0 hours old. https://wikitech.wikimedia.org/wiki/Mirrors [11:39:06] RECOVERY - Ensure traffic_exporter binds on port 9322 and responds to HTTP requests on cp3054 is OK: HTTP OK: HTTP/1.0 200 OK - 22372 bytes in 0.259 second response time https://wikitech.wikimedia.org/wiki/Apache_Traffic_Server [13:42:09] Is something wrong with graphoid, I'm noticing a bunch of broken images related to the graph extension (e.g. https://en.wikipedia.org/wiki/Template:Graph:Map ) [13:42:20] No idea if this is new brokeness, or maybe those examples have always been broken [13:42:49] I've noticed on some other template pages as well [13:49:07] meh, leaning towards its broken on wiki [13:51:15] bawolff: do you mind filing a task and cc'ing me? [13:51:48] I looked up in logstash, and its complaining about not being able to load a template on wiki, and the template seems to indeed not exist [13:52:00] So I think this actually is just someone broke the template on wiki [13:52:19] ah, okay [13:55:40] graphoid: Page content not available wikiraw:///Module%3AGraph%2FWorldMap-iso2.json [13:55:42] ? [13:58:32] yeah [13:58:35] I found the edit [13:58:51] Its appearently been broken for like a week, and used on 500 pages. Clearly people really value their maps ;) [13:59:44] I'd say the fact that there is no error message in the interface for something like this is kind of a bug in graphs extension... but meh [13:59:54] Anyways, sorry for bothering everyone :) [14:00:19] https://en.wikipedia.org/w/index.php?title=Module%3AGraph&type=revision&diff=947978403&oldid=946859819 seemed to have fixed things [14:19:45] 10Operations, 10LDAP-Access-Requests, 10Patch-For-Review: LDAP access to the wmf group for Pita - https://phabricator.wikimedia.org/T247722 (10Aklapper) @Jpita: On wikitech, see the previous comment. (If your question was about logging into the `Jose pita` account.) [15:35:06] 10Operations, 10Gerrit, 10Traffic, 10HTTPS, and 2 others: Disable TLS 1.0 and 1.1 in apache for gerrit.wikimedia.org - https://phabricator.wikimedia.org/T221499 (10Krenair) Yeah, when I wrote it, it probably did have the effect of disabling TLS 1.0/1.1 - but since then https://gerrit.wikimedia.org/r/c/oper... [16:17:08] (03PS1) 10Andrew Bogott: ssh-key-ldap-lookup: port to python3 [puppet] - 10https://gerrit.wikimedia.org/r/584211 [16:21:44] (03CR) 10Andrew Bogott: "This is a very small change but very scary since it could break logins for our users." [puppet] - 10https://gerrit.wikimedia.org/r/584211 (owner: 10Andrew Bogott) [16:35:33] 10Operations, 10SRE-Access-Requests: Add aaron, dpifke and phedenskog to analytics-privatedata-users - https://phabricator.wikimedia.org/T248797 (10Gilles) [16:36:30] (03CR) 10Alex Monk: [C: 04-1] ssh-key-ldap-lookup: port to python3 (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/584211 (owner: 10Andrew Bogott) [16:37:48] (03CR) 10Gilles: [C: 03+1] ATS: remove debug HTTP headers if X-Wikimedia-Debug is absent (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/583570 (https://phabricator.wikimedia.org/T210484) (owner: 10Ema) [16:38:19] (03CR) 10Gilles: [C: 03+1] cache: stop sending X-Varnish [puppet] - 10https://gerrit.wikimedia.org/r/583942 (https://phabricator.wikimedia.org/T210484) (owner: 10Ema) [16:38:50] (03CR) 10Gilles: [C: 03+1] ATS: remove debug HTTP headers if X-Wikimedia-Debug is absent (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/583570 (https://phabricator.wikimedia.org/T210484) (owner: 10Ema) [16:58:52] nowiki seems to be dead? [16:59:15] loading again [16:59:16] (03PS2) 10Andrew Bogott: ssh-key-ldap-lookup: port to python3 [puppet] - 10https://gerrit.wikimedia.org/r/584211 [17:12:48] It was a huge page (1MB) that caused the touble. [19:58:19] (03PS2) 10Krinkle: jenkins: Adjust CSP header to allow inline CSS and video playback [puppet] - 10https://gerrit.wikimedia.org/r/582604 (https://phabricator.wikimedia.org/T245658) (owner: 10Brian Wolff) [19:58:59] (03CR) 10Krinkle: [C: 03+1] jenkins: Adjust CSP header to allow inline CSS and video playback [puppet] - 10https://gerrit.wikimedia.org/r/582604 (https://phabricator.wikimedia.org/T245658) (owner: 10Brian Wolff) [20:18:06] (03CR) 10Krinkle: [C: 03+1] cache: stop sending X-Varnish [puppet] - 10https://gerrit.wikimedia.org/r/583942 (https://phabricator.wikimedia.org/T210484) (owner: 10Ema) [20:34:10] PROBLEM - snapshot of s1 in eqiad on db1115 is CRITICAL: snapshot for s1 at eqiad taken more than 3 days ago: Most recent backup 2020-03-26 20:29:20 https://wikitech.wikimedia.org/wiki/MariaDB/Backups [20:56:30] PROBLEM - snapshot of s1 in codfw on db1115 is CRITICAL: snapshot for s1 at codfw taken more than 3 days ago: Most recent backup 2020-03-26 20:32:49 https://wikitech.wikimedia.org/wiki/MariaDB/Backups [21:36:14] PROBLEM - snapshot of s8 in eqiad on db1115 is CRITICAL: snapshot for s8 at eqiad taken more than 3 days ago: Most recent backup 2020-03-26 21:21:37 https://wikitech.wikimedia.org/wiki/MariaDB/Backups [21:58:40] RECOVERY - snapshot of s1 in codfw on db1115 is OK: snapshot for s1 at codfw taken less than 3 days ago and larger than 90 GB: Last one 2020-03-29 20:33:57 from db2097.codfw.wmnet:3311 (983 GB) https://wikitech.wikimedia.org/wiki/MariaDB/Backups [21:59:22] PROBLEM - snapshot of s8 in codfw on db1115 is CRITICAL: snapshot for s8 at codfw taken more than 3 days ago: Most recent backup 2020-03-26 21:25:46 https://wikitech.wikimedia.org/wiki/MariaDB/Backups [22:07:30] RECOVERY - snapshot of s1 in eqiad on db1115 is OK: snapshot for s1 at eqiad taken less than 3 days ago and larger than 90 GB: Last one 2020-03-29 20:29:41 from db1139.eqiad.wmnet:3311 (958 GB) https://wikitech.wikimedia.org/wiki/MariaDB/Backups [22:31:16] PROBLEM - IPv6 ping to ulsfo on ripe-atlas-ulsfo IPv6 is CRITICAL: CRITICAL - failed 57 probes of 546 (alerts on 50) - https://atlas.ripe.net/measurements/1791309/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas [22:40:16] PROBLEM - IPv6 ping to eqiad on ripe-atlas-eqiad IPv6 is CRITICAL: CRITICAL - failed 53 probes of 546 (alerts on 50) - https://atlas.ripe.net/measurements/1790947/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas [22:43:42] PROBLEM - IPv6 ping to codfw on ripe-atlas-codfw IPv6 is CRITICAL: CRITICAL - failed 57 probes of 546 (alerts on 50) - https://atlas.ripe.net/measurements/1791212/#!map https://wikitech.wikimedia.org/wiki/Network_monitoring%23Atlas_alerts https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas [23:36:18] 10Operations, 10Traffic, 10Wikimedia-General-or-Unknown, 10User-DannyS712: Pages whose title ends with semicolon (;) are intermittently inaccessible - https://phabricator.wikimedia.org/T238285 (10Krinkle) >>! In T238285#5666483, @Vgutierrez wrote: > BTW, Checking RFC 3986, I'm not sure that `https://ban.wi... [23:42:04] PROBLEM - snapshot of s4 in eqiad on db1115 is CRITICAL: snapshot for s4 at eqiad taken more than 3 days ago: Most recent backup 2020-03-26 23:30:44 https://wikitech.wikimedia.org/wiki/MariaDB/Backups [23:52:56] PROBLEM - snapshot of s4 in codfw on db1115 is CRITICAL: snapshot for s4 at codfw taken more than 3 days ago: Most recent backup 2020-03-26 23:37:48 https://wikitech.wikimedia.org/wiki/MariaDB/Backups