[00:01:33] (03PS1) 10Dzahn: misc:varnish: rename bromine director, clean up unused design director [puppet] - 10https://gerrit.wikimedia.org/r/420137 (https://phabricator.wikimedia.org/T188163) [00:07:51] (03CR) 10Dzahn: [C: 032] misc:varnish: rename bromine director, clean up unused design director [puppet] - 10https://gerrit.wikimedia.org/r/420137 (https://phabricator.wikimedia.org/T188163) (owner: 10Dzahn) [00:13:19] !log running puppet on all cache::misc to rename director bromine to webserver_misc_static (T188163) [00:13:24] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [00:13:25] T188163: create codfw-equivalent of bromine, make webserver_misc_static active/active in misc varnish - https://phabricator.wikimedia.org/T188163 [00:31:12] (03PS7) 10Krinkle: webperf: Always record country specific when oversampling [puppet] - 10https://gerrit.wikimedia.org/r/419738 (https://phabricator.wikimedia.org/T189780) (owner: 10Imarlier) [00:31:15] (03CR) 10Krinkle: [C: 031] webperf: Always record country specific when oversampling [puppet] - 10https://gerrit.wikimedia.org/r/419738 (https://phabricator.wikimedia.org/T189780) (owner: 10Imarlier) [00:36:01] 10Operations, 10Patch-For-Review: create codfw-equivalent of bromine, make webserver_misc_static active/active in misc varnish - https://phabricator.wikimedia.org/T188163#4058010 (10Krinkle) [00:36:22] 10Operations, 10Availability, 10Patch-For-Review: create codfw-equivalent of bromine, make webserver_misc_static active/active in misc varnish - https://phabricator.wikimedia.org/T188163#3998458 (10Krinkle) [00:43:50] RECOVERY - Ubuntu mirror in sync with upstream on sodium is OK: /srv/mirrors/ubuntu is over 1 hours old. [00:46:52] (03PS1) 10Dzahn: cache::misc: switch webserver_misc_static to codfw backend [puppet] - 10https://gerrit.wikimedia.org/r/420142 (https://phabricator.wikimedia.org/T188163) [00:56:21] (03CR) 10Dzahn: [C: 04-1] "think i'm supposed to first enable both and not switch between them in a single step" [puppet] - 10https://gerrit.wikimedia.org/r/420142 (https://phabricator.wikimedia.org/T188163) (owner: 10Dzahn) [01:15:12] (03PS1) 10Dzahn: site: enable mapped IPv6 on bromine/vega [puppet] - 10https://gerrit.wikimedia.org/r/420143 [01:15:40] (03CR) 10jerkins-bot: [V: 04-1] site: enable mapped IPv6 on bromine/vega [puppet] - 10https://gerrit.wikimedia.org/r/420143 (owner: 10Dzahn) [01:16:56] (03PS1) 10Dzahn: site: remove mapped IPv6 from californium [puppet] - 10https://gerrit.wikimedia.org/r/420144 [01:20:59] 10Operations, 10Cloud-Services, 10DC-Ops: decom californium - https://phabricator.wikimedia.org/T189921#4058041 (10Dzahn) p:05Triage>03Normal [01:21:12] (03PS1) 10Dzahn: remove californium.wikimedia.org [dns] - 10https://gerrit.wikimedia.org/r/420145 (https://phabricator.wikimedia.org/T189921) [01:21:47] (03PS2) 10Dzahn: site: remove mapped IPv6 from californium [puppet] - 10https://gerrit.wikimedia.org/r/420144 (https://phabricator.wikimedia.org/T189921) [01:22:13] 10Operations, 10Cloud-Services, 10DC-Ops, 10Patch-For-Review: decom californium - https://phabricator.wikimedia.org/T189921#4058056 (10Dzahn) a:05Andrew>03None [01:23:17] 10Operations, 10Cloud-Services, 10DC-Ops, 10Patch-For-Review: decom californium - https://phabricator.wikimedia.org/T189921#4058041 (10Dzahn) [01:24:06] 10Operations, 10Cloud-Services, 10DC-Ops, 10hardware-requests: decom californium - https://phabricator.wikimedia.org/T189921#4058041 (10Dzahn) [01:28:47] (03PS1) 10Dzahn: decom californium from site and install_server [puppet] - 10https://gerrit.wikimedia.org/r/420147 (https://phabricator.wikimedia.org/T189921) [01:29:44] 10Operations, 10Cloud-Services, 10DC-Ops, 10hardware-requests, 10Patch-For-Review: decom californium - https://phabricator.wikimedia.org/T189921#4058062 (10Dzahn) [01:37:05] 04Critical Alert for device cr2-codfw.wikimedia.org - Primary inbound port utilisation over 80% [01:42:04] 04̶C̶r̶i̶t̶i̶c̶a̶l Device cr2-codfw.wikimedia.org recovered from Primary inbound port utilisation over 80% [01:44:18] (03CR) 10Imarlier: [C: 031] webperf: Always record country specific when oversampling [puppet] - 10https://gerrit.wikimedia.org/r/419738 (https://phabricator.wikimedia.org/T189780) (owner: 10Imarlier) [01:52:04] 04Critical Alert for device cr2-codfw.wikimedia.org - Primary inbound port utilisation over 80% [01:54:21] Hmm that’s new [01:57:05] 04̶C̶r̶i̶t̶i̶c̶a̶l Device cr2-codfw.wikimedia.org recovered from Primary inbound port utilisation over 80% [01:57:50] Is codfw down? [01:59:22] paladox: yeah, librenms is also alerting to IRC now [01:59:36] Ah thanks [01:59:55] One of the analytics hosts (furud) is pulling 6+ Gbps from eqiad, and bringing one of the eqiad/codfw link to a risky level [02:00:23] Oh [02:05:07] * quiddity adds to https://wikitech.wikimedia.org/wiki/IRC_bots [02:22:43] * Krinkle makes random other edits, thanks quiddity for the page [02:29:33] \o/ [02:52:05] 04Critical Alert for device cr2-codfw.wikimedia.org - Primary inbound port utilisation over 80% [02:57:05] 04̶C̶r̶i̶t̶i̶c̶a̶l Device cr2-codfw.wikimedia.org recovered from Primary inbound port utilisation over 80% [03:05:15] (03PS1) 10Krinkle: noc: Add [check replag] links to db.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/420153 [03:05:29] (03CR) 10Krinkle: [C: 032] "noc" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/420153 (owner: 10Krinkle) [03:06:47] (03Merged) 10jenkins-bot: noc: Add [check replag] links to db.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/420153 (owner: 10Krinkle) [03:08:48] (03CR) 10jenkins-bot: noc: Add [check replag] links to db.php [mediawiki-config] - 10https://gerrit.wikimedia.org/r/420153 (owner: 10Krinkle) [03:09:17] !log krinkle@tin Synchronized docroot/noc/db.php: noc: I410a56431a (duration: 00m 59s) [03:09:23] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [03:09:58] RoanKattouw: :) - https://noc.wikimedia.org/db.php?& [03:11:32] quiddity: Look what you made me do :D ^ [03:15:51] :D [03:26:19] PROBLEM - MariaDB Slave Lag: s1 on dbstore1002 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 698.74 seconds [03:31:30] lgm [04:06:19] RECOVERY - MariaDB Slave Lag: s1 on dbstore1002 is OK: OK slave_sql_lag Replication lag: 290.02 seconds [04:59:20] (03PS1) 10Gergő Tisza: Log ReadingLists warnings [mediawiki-config] - 10https://gerrit.wikimedia.org/r/420155 (https://phabricator.wikimedia.org/T189340) [05:50:40] (03PS3) 10Gergő Tisza: Enable Wikidata description override on testwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/418843 (https://phabricator.wikimedia.org/T184000) [05:52:39] (03PS4) 10Gergő Tisza: Enable Wikidata description override on testwiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/418843 (https://phabricator.wikimedia.org/T184000) [07:17:23] 10Operations, 10DBA, 10hardware-requests: Decommission db1009 - https://phabricator.wikimedia.org/T189216#4058198 (10Marostegui) So, the checks finished and there were differences on testreduce_0715.results (173GB) table, between the following rows: ``` 40590911 40650121 ``` I have confirmed that this is not... [09:35:20] PROBLEM - IPv6 ping to eqiad on ripe-atlas-eqiad IPv6 is CRITICAL: CRITICAL - failed 23 probes of 302 (alerts on 19) - https://atlas.ripe.net/measurements/1790947/#!map [09:40:20] RECOVERY - IPv6 ping to eqiad on ripe-atlas-eqiad IPv6 is OK: OK - failed 9 probes of 302 (alerts on 19) - https://atlas.ripe.net/measurements/1790947/#!map [10:57:55] (03CR) 10Volans: "The full compilation did timeout after 180 minutes (needs to be increased a bit), but here are the results and the only failures are compi" [puppet] - 10https://gerrit.wikimedia.org/r/419709 (owner: 10Jcrespo) [11:07:49] (03CR) 10Volans: [C: 031] "LGTM" [puppet] - 10https://gerrit.wikimedia.org/r/406779 (https://phabricator.wikimedia.org/T185967) (owner: 10Volans) [11:28:32] 10Operations, 10ops-codfw: attach furud's new arrays (furud-array[3-7]) - https://phabricator.wikimedia.org/T185153#4058298 (10faidon) [12:07:53] (03CR) 10Jcrespo: "Can you check if re.match does a "contains" and not a "full-string match". I remember being the second (so it would need a .* at the end)," [puppet] - 10https://gerrit.wikimedia.org/r/420114 (https://phabricator.wikimedia.org/T188680) (owner: 10Bstorm) [12:12:02] (03CR) 10Jcrespo: "More like [0-9]+', but I hope you get my question." [puppet] - 10https://gerrit.wikimedia.org/r/420114 (https://phabricator.wikimedia.org/T188680) (owner: 10Bstorm) [13:32:05] 04Critical Alert for device cr2-codfw.wikimedia.org - Primary inbound port utilisation over 80% [13:37:06] 04̶C̶r̶i̶t̶i̶c̶a̶l Device cr2-codfw.wikimedia.org recovered from Primary inbound port utilisation over 80% [14:18:06] 04Critical Alert for device cr2-eqiad.wikimedia.org - Primary outbound port utilisation over 80% [14:28:06] 04̶C̶r̶i̶t̶i̶c̶a̶l Device cr2-eqiad.wikimedia.org recovered from Primary outbound port utilisation over 80% [14:32:06] 04Critical Alert for device cr2-codfw.wikimedia.org - Primary inbound port utilisation over 80% [14:37:05] 04̶C̶r̶i̶t̶i̶c̶a̶l Device cr2-codfw.wikimedia.org recovered from Primary inbound port utilisation over 80% [14:52:24] (03PS1) 10Andrew Bogott: import-wikitech: add --uploads to importDump [wikitech-static] - 10https://gerrit.wikimedia.org/r/420169 [14:52:40] (03CR) 10Andrew Bogott: [V: 032 C: 032] import-wikitech: add --uploads to importDump [wikitech-static] - 10https://gerrit.wikimedia.org/r/420169 (owner: 10Andrew Bogott) [15:32:05] 04Critical Alert for device cr2-codfw.wikimedia.org - Primary inbound port utilisation over 80% [15:38:06] 04̶C̶r̶i̶t̶i̶c̶a̶l Device cr2-codfw.wikimedia.org recovered from Primary inbound port utilisation over 80% [18:41:09] !log executed apt-get clean on scb1004 to free some space (root partition disk space warning) [18:41:14] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log