[00:11:31] RECOVERY - https://grafana.wikimedia.org/dashboard/db/varnish-http-requests grafana alert on einsteinium is OK: OK: https://grafana.wikimedia.org/dashboard/db/varnish-http-requests is not alerting. [00:12:03] 10Operations, 10Gadgets, 10MediaWiki-Cache, 10Performance-Team: test.wp is using test2.wp's message cache - https://phabricator.wikimedia.org/T197450 (10Legoktm) 05Open>03Resolved >>! In T197450#4395486, @Krinkle wrote: > @Legoktm Can you provide an example of a missing message? I just looked at two ex... [00:15:55] (03PS1) 10Jforrester: Load TimedMediaHandler via static extension registration [mediawiki-config] - 10https://gerrit.wikimedia.org/r/448176 (https://phabricator.wikimedia.org/T140852) [00:15:57] (03PS1) 10Jforrester: Load TimedMediaHandler's i18n via static extension registration [mediawiki-config] - 10https://gerrit.wikimedia.org/r/448177 (https://phabricator.wikimedia.org/T140852) [00:17:51] (03CR) 10Jforrester: [C: 04-2] "Not 'til wmf.15 is everywhere and won't go back." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/448176 (https://phabricator.wikimedia.org/T140852) (owner: 10Jforrester) [00:21:11] (03CR) 10Gehel: Add common base utility modules (036 comments) [software/spicerack] - 10https://gerrit.wikimedia.org/r/448047 (https://phabricator.wikimedia.org/T199079) (owner: 10Volans) [00:21:13] (03PS1) 10Reedy: wfLoadExtension for Sentry and LiquidThreads [mediawiki-config] - 10https://gerrit.wikimedia.org/r/448178 [00:23:08] (03PS1) 10Reedy: Remove LQT config cruft [mediawiki-config] - 10https://gerrit.wikimedia.org/r/448179 [00:23:56] (03CR) 10Jforrester: [C: 031] "Good whenever." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/448178 (owner: 10Reedy) [00:24:07] (03CR) 10Jforrester: [C: 031] "Yuuuup." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/448179 (owner: 10Reedy) [00:24:10] (03PS1) 10Reedy: Remove sentry $wmg -> $wg [mediawiki-config] - 10https://gerrit.wikimedia.org/r/448182 [00:26:57] (03CR) 10BryanDavis: [C: 031] "+1 instead of +2 only because I don't have time to actually build all of the dependent images at the moment." [docker-images/toollabs-images] - 10https://gerrit.wikimedia.org/r/447622 (https://phabricator.wikimedia.org/T190274) (owner: 10Zhuyifei1999) [00:27:09] 10Operations, 10DBA, 10JADE, 10Scoring-platform-team, 10TechCom-RFC: Introduce a new namespace for collaborative judgments about wiki entities - https://phabricator.wikimedia.org/T200297 (10awight) [00:38:10] RECOVERY - Router interfaces on cr1-eqord is OK: OK: host 208.80.154.198, interfaces up: 39, down: 0, dormant: 0, excluded: 0, unused: 0 [00:38:21] RECOVERY - Router interfaces on cr1-ulsfo is OK: OK: host 198.35.26.192, interfaces up: 65, down: 0, dormant: 0, excluded: 0, unused: 0 [00:41:37] jouncebot: now [00:41:37] No deployments scheduled for the next 82 hour(s) and 18 minute(s) [01:09:11] (03CR) 10EBernhardson: [C: 031] Upgrade to 6.3.1-alpha1 (without hebrew) [software/elasticsearch/plugins] - 10https://gerrit.wikimedia.org/r/446869 (https://phabricator.wikimedia.org/T199791) (owner: 10DCausse) [01:10:24] hmm, had to chmod some .git/objects stuff (missing u+w, g+w) [01:10:49] (03CR) 10EBernhardson: [C: 031] search.wikimedia.org should properly handle multivalue separation char (0x1F) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/446801 (owner: 10DCausse) [01:19:58] !log aaron@deploy1001 Synchronized php-1.32.0-wmf.14/includes/libs/objectcache/BagOStuff.php: f208a431f912 (duration: 00m 56s) [01:20:02] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [02:08:37] zeljkof: did you do any unusual commands lately? I always wonder how those sort of permission issues sprout up from time to time. [02:10:07] bad umask? [02:33:29] (03CR) 10Krinkle: [C: 031] Remove LQT config cruft [mediawiki-config] - 10https://gerrit.wikimedia.org/r/448179 (owner: 10Reedy) [02:35:11] (03CR) 10Krinkle: [C: 031] "I know you know, but just in case: IS.php must sync first if deployed as one commit, and it won't be testable on mwdebug unless manually p" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/448182 (owner: 10Reedy) [03:21:39] (03PS2) 10BBlack: p::cache::base: fix storage_parts typo [puppet] - 10https://gerrit.wikimedia.org/r/448075 [03:22:19] (03CR) 10BBlack: [C: 032] p::cache::base: fix storage_parts typo [puppet] - 10https://gerrit.wikimedia.org/r/448075 (owner: 10BBlack) [03:25:36] (03PS2) 10BBlack: storage config tweaks for cp1075-99 [puppet] - 10https://gerrit.wikimedia.org/r/448076 (https://phabricator.wikimedia.org/T195923) [03:26:31] PROBLEM - MariaDB Slave Lag: s1 on dbstore1002 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 943.06 seconds [03:29:11] (03CR) 10BBlack: [C: 032] storage config tweaks for cp1075-99 [puppet] - 10https://gerrit.wikimedia.org/r/448076 (https://phabricator.wikimedia.org/T195923) (owner: 10BBlack) [03:39:10] RECOVERY - MariaDB Slave Lag: s1 on dbstore1002 is OK: OK slave_sql_lag Replication lag: 126.82 seconds [03:41:50] (03CR) 10Krinkle: search.wikimedia.org should properly handle multivalue separation char (0x1F) (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/446801 (owner: 10DCausse) [03:54:42] (03PS1) 10BBlack: installer late_command: run nvme inside target [puppet] - 10https://gerrit.wikimedia.org/r/448300 (https://phabricator.wikimedia.org/T195923) [03:55:07] (03CR) 10BBlack: [V: 032 C: 032] installer late_command: run nvme inside target [puppet] - 10https://gerrit.wikimedia.org/r/448300 (https://phabricator.wikimedia.org/T195923) (owner: 10BBlack) [03:55:49] 10Operations, 10Wikimedia-Apache-configuration, 10Chinese-Sites, 10Patch-For-Review, 10User-Urbanecm: All "zh-my" variant page views get 404 Not Found on zh.wikipedia.org - https://phabricator.wikimedia.org/T198371 (10Shizhao) 05Open>03Resolved [04:15:50] (03PS1) 10BBlack: late_command: fix sfdisk path [puppet] - 10https://gerrit.wikimedia.org/r/448306 (https://phabricator.wikimedia.org/T195923) [04:16:15] (03CR) 10BBlack: [C: 032] late_command: fix sfdisk path [puppet] - 10https://gerrit.wikimedia.org/r/448306 (https://phabricator.wikimedia.org/T195923) (owner: 10BBlack) [04:49:07] (03CR) 10Zhuyifei1999: [C: 032] Add libmysqlclient-dev to python 3 base docker image [docker-images/toollabs-images] - 10https://gerrit.wikimedia.org/r/447622 (https://phabricator.wikimedia.org/T190274) (owner: 10Zhuyifei1999) [04:49:30] (03Merged) 10jenkins-bot: Add libmysqlclient-dev to python 3 base docker image [docker-images/toollabs-images] - 10https://gerrit.wikimedia.org/r/447622 (https://phabricator.wikimedia.org/T190274) (owner: 10Zhuyifei1999) [04:51:16] (03PS1) 10Marostegui: db-eqiad.php: Depool db1094 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/448317 [04:55:51] (03CR) 10Marostegui: [C: 032] db-eqiad.php: Depool db1094 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/448317 (owner: 10Marostegui) [04:57:05] (03Merged) 10jenkins-bot: db-eqiad.php: Depool db1094 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/448317 (owner: 10Marostegui) [04:58:05] (03CR) 10jenkins-bot: db-eqiad.php: Depool db1094 [mediawiki-config] - 10https://gerrit.wikimedia.org/r/448317 (owner: 10Marostegui) [04:58:16] !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Depool db1094 (duration: 00m 57s) [04:58:19] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [04:59:17] !log Deploy schema change on db1094 T144010 T51190 T199368 [04:59:22] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [04:59:22] T51190: Truncate SHA-1 indexes - https://phabricator.wikimedia.org/T51190 [04:59:23] T144010: Drop eu_touched in production - https://phabricator.wikimedia.org/T144010 [04:59:23] T199368: Convert UNIQUE INDEX to PK in Production - https://phabricator.wikimedia.org/T199368 [05:13:37] (03PS1) 10Marostegui: Revert "db-eqiad.php: Depool db1094" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/448323 [05:17:14] (03CR) 10Marostegui: [C: 032] Revert "db-eqiad.php: Depool db1094" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/448323 (owner: 10Marostegui) [05:18:31] (03Merged) 10jenkins-bot: Revert "db-eqiad.php: Depool db1094" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/448323 (owner: 10Marostegui) [05:18:44] (03CR) 10jenkins-bot: Revert "db-eqiad.php: Depool db1094" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/448323 (owner: 10Marostegui) [05:19:39] !log marostegui@deploy1001 Synchronized wmf-config/db-eqiad.php: Repool db1094 (duration: 00m 55s) [05:19:42] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:55:11] PROBLEM - IPv6 ping to eqiad on ripe-atlas-eqiad IPv6 is CRITICAL: CRITICAL - failed 29 probes of 309 (alerts on 19) - https://atlas.ripe.net/measurements/1790947/#!map [05:59:20] !log reboot webperf1002, webperf2002 for new disk to appear T199853 [05:59:24] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [05:59:24] T199853: Increase webperf1002/webperf2002 space from 50GB to 150GB (Ganeti) - https://phabricator.wikimedia.org/T199853 [06:00:21] RECOVERY - IPv6 ping to eqiad on ripe-atlas-eqiad IPv6 is OK: OK - failed 13 probes of 309 (alerts on 19) - https://atlas.ripe.net/measurements/1790947/#!map [06:01:01] PROBLEM - Host webperf2002 is DOWN: PING CRITICAL - Packet loss = 100% [06:01:10] PROBLEM - Host webperf1002 is DOWN: PING CRITICAL - Packet loss = 100% [06:16:21] PROBLEM - HTTP availability for Nginx -SSL terminators- at ulsfo on einsteinium is CRITICAL: cluster=cache_upload site=ulsfo https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=4&fullscreen&refresh=1m&orgId=1 [06:18:40] RECOVERY - HTTP availability for Nginx -SSL terminators- at ulsfo on einsteinium is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=4&fullscreen&refresh=1m&orgId=1 [06:19:48] ema: whenever you have time can you explain me --^ [06:27:48] hmmm [06:27:55] so ... predictable interface names ? [06:28:01] no so predictable after all [06:28:15] webperf1002 and webperf2002 are unreacheable currently [06:28:22] I logged in via console and ... [06:28:31] 2: ens6: mtu 1500 qdisc noop state DOWN group default qlen 1000 [06:28:35] but .. config has ... [06:28:40] allow-hotplug ens5 [06:28:40] iface ens5 inet static [06:28:46] lovely [06:28:55] so adding a hard disk ended up in the interface changing name [06:29:03] /o\ [06:29:12] a hard disk!!! [06:29:47] so what ? I added a disk and the PCIe slot index changed ? [06:31:08] 00:05.0 SCSI storage controller: Red Hat, Inc Virtio block device [06:31:11] yup [06:31:15] that's exactly what happened [06:31:39] sigh [06:32:10] PROBLEM - puppet last run on seaborgium is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 6 minutes ago with 1 failures. Failed resources (up to 3 shown): File[/usr/local/bin/gen_fingerprints] [06:32:40] RECOVERY - Host webperf1002 is UP: PING OK - Packet loss = 0%, RTA = 0.50 ms [06:33:21] akosiaris: (curious) so adding a hd changed all the other slot indexes, and hence the interface name? [06:33:40] elukey: yup. Look at https://www.freedesktop.org/wiki/Software/systemd/PredictableNetworkInterfaceNames/ for an explanation [06:33:40] RECOVERY - Host webperf2002 is UP: PING OK - Packet loss = 0%, RTA = 36.28 ms [06:34:10] so policy 1) is not applicable here it seems [06:34:19] as KVM seems to add network cards as PCIe [06:34:27] so policy 2) takes place [06:34:48] ahhhh [06:35:35] I am wondering why policy 2 if preferable to policy 3 [06:35:42] I guess there is some reason I am missing [06:36:23] not that it would change anything in this case [06:36:37] e.g. on my machine, I have enp3s0f0 [06:36:51] for a 03:00.0 Ethernet controller: Broadcom Limited NetXtreme BCM57766 Gigabit Ethernet PCIe (rev 01) [06:37:07] so still PCI slot seems to be the very first [06:38:14] er no that's not correct [06:38:57] Slot: 03:00.0 vs Slot: 00:05.0 [06:39:44] AaronSchulz: I don't think I did any unusual commands, I'm new to train so just following the docs [06:40:21] the format is [domain:]bus:device.function [06:40:45] so enp3s0f0 is pci bus 3, device 0, function 0 [06:43:10] ah and the reason policy 2 takes place in the VM is that the card is hotpluggable [06:43:20] whereas the nic is not on my machine [06:43:41] sigh.. this must be like the 20th time I revisit this page and everytime I find something new [06:46:05] (03PS2) 10Elukey: EventStreams: Use the default log level (warn) [puppet] - 10https://gerrit.wikimedia.org/r/448152 (owner: 10Mobrovac) [06:47:05] (03CR) 10Elukey: [C: 032] EventStreams: Use the default log level (warn) [puppet] - 10https://gerrit.wikimedia.org/r/448152 (owner: 10Mobrovac) [06:51:41] PROBLEM - HTTP availability for Nginx -SSL terminators- at ulsfo on einsteinium is CRITICAL: cluster=cache_upload site=ulsfo https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=4&fullscreen&refresh=1m&orgId=1 [06:54:01] RECOVERY - HTTP availability for Nginx -SSL terminators- at ulsfo on einsteinium is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=4&fullscreen&refresh=1m&orgId=1 [06:54:11] PROBLEM - Check systemd state on webperf2002 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [06:55:27] 10Operations, 10vm-requests, 10Performance-Team (Radar): Increase webperf1002/webperf2002 space from 50GB to 150GB (Ganeti) - https://phabricator.wikimedia.org/T199853 (10akosiaris) 05Open>03Resolved On both webperf1002 and webperf2002 we have `/dev/vdb 147G 331M 139G 1% /srv` I 've mounted... [06:55:29] 10Operations, 10Performance-Team, 10monitoring, 10Patch-For-Review: Consolidate performance website and related software - https://phabricator.wikimedia.org/T158837 (10akosiaris) [06:57:50] RECOVERY - puppet last run on seaborgium is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [07:06:18] 10Operations, 10DBA, 10JADE, 10Scoring-platform-team, 10TechCom-RFC: Introduce a new namespace for collaborative judgments about wiki entities - https://phabricator.wikimedia.org/T200297 (10jcrespo) Please rename the proposal- we have absolutely no issue with this being a new namespace or how it is named... [07:11:41] (03PS2) 10Alexandros Kosiaris: Bump Deployment apiVersion to apps/v1beta2 [deployment-charts] - 10https://gerrit.wikimedia.org/r/448045 [07:17:13] 10Operations, 10DBA, 10Traffic, 10Patch-For-Review: Framework to transfer files over the LAN - https://phabricator.wikimedia.org/T156462 (10jcrespo) The original scope isn't met by far: * No throttling except it is easy to implement with pv * It is not intelligent * Compression is only on/off, not configur... [07:21:12] (03PS1) 10Elukey: mediawiki::apache::wikimedia.conf: avoid inline comments for ServerAlias [puppet] - 10https://gerrit.wikimedia.org/r/448384 [07:22:54] (03CR) 10Elukey: [C: 032] mediawiki::apache::wikimedia.conf: avoid inline comments for ServerAlias [puppet] - 10https://gerrit.wikimedia.org/r/448384 (owner: 10Elukey) [07:23:51] testing on mw1266 [07:23:58] (with puppet disabled) [07:25:32] (03CR) 10Alexandros Kosiaris: [V: 032 C: 032] "Tested in minikube, worked fine" [deployment-charts] - 10https://gerrit.wikimedia.org/r/448045 (owner: 10Alexandros Kosiaris) [07:36:13] all right doing mw2* and tailing mwlog1001's apache2.log just in case [07:41:19] 10Operations, 10DBA, 10JADE, 10Scoring-platform-team, 10TechCom-RFC: Introduce a new namespace for collaborative judgments about wiki entities - https://phabricator.wikimedia.org/T200297 (10awight) >>! In T200297#4455344, @jcrespo wrote: > Please rename the proposal- we have absolutely no issue with this... [07:45:16] (03PS2) 10Alexandros Kosiaris: Remove grafana-admin.wikimedia.org [dns] - 10https://gerrit.wikimedia.org/r/442306 (https://phabricator.wikimedia.org/T170150) [07:45:22] (03CR) 10Alexandros Kosiaris: [V: 032 C: 032] Remove grafana-admin.wikimedia.org [dns] - 10https://gerrit.wikimedia.org/r/442306 (https://phabricator.wikimedia.org/T170150) (owner: 10Alexandros Kosiaris) [07:47:17] 10Operations, 10monitoring, 10Patch-For-Review: Evaluate Grafana's LDAP group options and deprecate grafana-admin if possible - https://phabricator.wikimedia.org/T170150 (10akosiaris) And I 've just removed the grafana-admin.wikimedia.org DNS RR. In the last few days the number of accesses to that virtualhos... [07:49:18] and now finally starting eqiad puppet runs [07:51:53] (03PS1) 10Alexandros Kosiaris: osm: Populate the osmupdater user [puppet] - 10https://gerrit.wikimedia.org/r/448393 (https://phabricator.wikimedia.org/T197246) [07:52:26] (03CR) 10jerkins-bot: [V: 04-1] osm: Populate the osmupdater user [puppet] - 10https://gerrit.wikimedia.org/r/448393 (https://phabricator.wikimedia.org/T197246) (owner: 10Alexandros Kosiaris) [07:59:54] (03Abandoned) 10Jcrespo: mariadb: Do not add host to production prometheus if it is on cloud [puppet] - 10https://gerrit.wikimedia.org/r/420683 (https://phabricator.wikimedia.org/T171203) (owner: 10Jcrespo) [08:03:06] (03CR) 10Alexandros Kosiaris: [V: 032 C: 032] "I 'll override jenkins on this one. This is old code and should be deprecated/removed, not fixed." [puppet] - 10https://gerrit.wikimedia.org/r/448393 (https://phabricator.wikimedia.org/T197246) (owner: 10Alexandros Kosiaris) [08:03:35] 10Operations, 10ops-eqiad, 10Cloud-VPS, 10cloud-services-team: Rack/cable/configure asw2-b-eqiad switch stack - https://phabricator.wikimedia.org/T183585 (10Gehel) elastic*, logstash* and wdqs* should be entirely transparent. Elastic might scream a bit about too many unallocated shards, but at most this sh... [08:11:56] (03PS5) 10Jcrespo: dbtree: move dbtree outside of mwmaint hosts [puppet] - 10https://gerrit.wikimedia.org/r/445597 (https://phabricator.wikimedia.org/T192092) [08:12:46] (03PS6) 10Jcrespo: dbtree: move dbtree outside of mwmaint hosts [puppet] - 10https://gerrit.wikimedia.org/r/445597 (https://phabricator.wikimedia.org/T192092) [08:12:49] (03CR) 10Jcrespo: "New version" (038 comments) [puppet] - 10https://gerrit.wikimedia.org/r/445597 (https://phabricator.wikimedia.org/T192092) (owner: 10Jcrespo) [08:12:56] (03CR) 10Gehel: [C: 04-1] "Looks good, minor comments inline." (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/447851 (https://phabricator.wikimedia.org/T194787) (owner: 10MSantos) [08:18:13] (03PS2) 10DCausse: search.wikimedia.org should properly handle multivalue separation char (0x1F) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/446801 [08:18:25] (03CR) 10DCausse: search.wikimedia.org should properly handle multivalue separation char (0x1F) (031 comment) [mediawiki-config] - 10https://gerrit.wikimedia.org/r/446801 (owner: 10DCausse) [08:25:11] (03PS7) 10Jcrespo: dbtree: move dbtree outside of mwmaint hosts [puppet] - 10https://gerrit.wikimedia.org/r/445597 (https://phabricator.wikimedia.org/T192092) [08:29:29] (03CR) 10MarcoAurelio: "> talked briefly on IRC about it with bd808. there is a plan to use" [dns] - 10https://gerrit.wikimedia.org/r/441817 (https://phabricator.wikimedia.org/T189637) (owner: 10MarcoAurelio) [08:32:54] (03PS8) 10Jcrespo: dbtree: move dbtree outside of mwmaint hosts [puppet] - 10https://gerrit.wikimedia.org/r/445597 (https://phabricator.wikimedia.org/T192092) [08:33:23] (03CR) 10MarcoAurelio: "In order to simplify, let's just allow bureaucrats to add and remove this permission locally maybe? The 'danger' is who gets access to it." [mediawiki-config] - 10https://gerrit.wikimedia.org/r/440676 (owner: 10Gergő Tisza) [08:36:51] !log ladsgroup@mwmaint1001:~$ foreachwikiindblist s3 populateChangeTagDef.php --sleep 2 (T193873) [08:36:55] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [08:36:56] T193873: Run maintenance script to populate change_tag_def on WMF production (all wikis) - https://phabricator.wikimedia.org/T193873 [08:38:13] (03CR) 10Jcrespo: [C: 031] "I would like to merge as is: https://puppet-compiler.wmflabs.org/compiler02/11888/" [puppet] - 10https://gerrit.wikimedia.org/r/445597 (https://phabricator.wikimedia.org/T192092) (owner: 10Jcrespo) [08:38:59] (03CR) 10Jcrespo: [C: 031] "Dzhan: Moving dbtree to a jessie host should unblock the decom of terbium." [puppet] - 10https://gerrit.wikimedia.org/r/445597 (https://phabricator.wikimedia.org/T192092) (owner: 10Jcrespo) [08:40:33] (03PS6) 10Jcrespo: phabricator/mariadb: Update database configuration for stretch/10.1 [puppet] - 10https://gerrit.wikimedia.org/r/377693 (https://phabricator.wikimedia.org/T175679) [08:41:01] (03CR) 10Jcrespo: [C: 032] phabricator/mariadb: Update database configuration for stretch/10.1 [puppet] - 10https://gerrit.wikimedia.org/r/377693 (https://phabricator.wikimedia.org/T175679) (owner: 10Jcrespo) [08:48:05] (03PS2) 10Jcrespo: mariadb: Add tokudb support for analytics eventlogging nodes [puppet] - 10https://gerrit.wikimedia.org/r/356648 [08:48:59] (03CR) 10Jcrespo: "Luca, this is still blocked on you to ok or not the change. I have rebased it." [puppet] - 10https://gerrit.wikimedia.org/r/356648 (owner: 10Jcrespo) [08:50:03] (03PS7) 10Jcrespo: [WIP]Remove $::mw_primary variable from puppet [puppet] - 10https://gerrit.wikimedia.org/r/345346 (https://phabricator.wikimedia.org/T156924) [08:51:28] This IRC spam is really walking on my nerves... [08:56:59] (03CR) 10Elukey: [C: 031] mariadb: Add tokudb support for analytics eventlogging nodes [puppet] - 10https://gerrit.wikimedia.org/r/356648 (owner: 10Jcrespo) [09:00:12] !log adjust aggregation to 'sum' for MediaWiki.edit sum metrics - T199968 [09:00:15] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:00:16] T199968: Investigate odd aggregation of MediaWiki.edit.failures.conflict.sum metric in graphite - https://phabricator.wikimedia.org/T199968 [09:00:40] (03PS1) 10Ema: vhtcpd (0.1.1-2) stretch-wikimedia; urgency=medium [software/varnish/vhtcpd] (debian) - 10https://gerrit.wikimedia.org/r/448424 (https://phabricator.wikimedia.org/T200445) [09:09:56] (03PS2) 10Ema: vhtcpd (0.1.1-2) stretch-wikimedia; urgency=medium [software/varnish/vhtcpd] (debian) - 10https://gerrit.wikimedia.org/r/448424 (https://phabricator.wikimedia.org/T200445) [09:10:12] (03CR) 10Tarrow: "I've removed the role from the instances in horizon and the mentions in Hiera. So I think you can now get this merged" [puppet] - 10https://gerrit.wikimedia.org/r/447564 (owner: 10EBernhardson) [09:18:36] (03PS3) 10Ema: vhtcpd (0.1.1-2) stretch-wikimedia; urgency=medium [software/varnish/vhtcpd] (debian) - 10https://gerrit.wikimedia.org/r/448424 (https://phabricator.wikimedia.org/T200445) [09:18:41] 10Operations, 10monitoring, 10Patch-For-Review: grafana fails to load dashboards from disk - https://phabricator.wikimedia.org/T200317 (10fgiunchedi) 05Open>03Resolved a:03fgiunchedi I checked grafana 5.2 and it correctly skips invalid dashboards from disk, mentioning which dashboards are failing to lo... [09:21:59] 10Operations, 10User-notice: 2018 data center switchover: Move all the things over to codfw - https://phabricator.wikimedia.org/T200022 (10Trizek-WMF) Thanks James! >>! In T200022#4453382, @Jdforrester-WMF wrote: > I'm sure @Johan is on top of actually advertising it. :-) Well, while he is off, I'm in his sh... [09:27:57] (03CR) 10Ema: [C: 032] vhtcpd (0.1.1-2) stretch-wikimedia; urgency=medium [software/varnish/vhtcpd] (debian) - 10https://gerrit.wikimedia.org/r/448424 (https://phabricator.wikimedia.org/T200445) (owner: 10Ema) [09:28:30] (03PS8) 10Jcrespo: [WIP]Remove $::mw_primary variable from puppet [puppet] - 10https://gerrit.wikimedia.org/r/345346 (https://phabricator.wikimedia.org/T156924) [09:37:07] !log vhtcpd 0.1.1-2 uploaded to stretch-wikimedia T200445 [09:37:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [09:37:11] T200445: Upgrade cache servers to stretch - https://phabricator.wikimedia.org/T200445 [09:37:46] 10Operations, 10Traffic, 10Patch-For-Review: Upgrade cache servers to stretch - https://phabricator.wikimedia.org/T200445 (10ema) [09:44:08] 10Operations, 10CommRel-Specialists-Support (Jul-Sep-2018), 10User-Johan: Community Relations support for the 2018 data center switchover - https://phabricator.wikimedia.org/T199676 (10Elitre) [09:45:18] (03PS9) 10Jcrespo: [WIP]Remove $::mw_primary variable from puppet [puppet] - 10https://gerrit.wikimedia.org/r/345346 (https://phabricator.wikimedia.org/T156924) [09:46:50] (03CR) 10Jcrespo: "So this version works, as in it doesn't error out horribly, but I am not sure it does what it should." [puppet] - 10https://gerrit.wikimedia.org/r/345346 (https://phabricator.wikimedia.org/T156924) (owner: 10Jcrespo) [09:48:45] (03CR) 10Jcrespo: [WIP]Remove $::mw_primary variable from puppet (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/345346 (https://phabricator.wikimedia.org/T156924) (owner: 10Jcrespo) [09:51:44] Amir1: I guess you are talking about spam in private messages, do you know you can block that? (let me know if you need help) [09:52:23] zeljkof: I block them when they arrive but there are so many new accounts, etc. [09:53:13] Amir1: I just saw mail from bd808 about blocking unregistered users by default, let me see the instructions [09:53:50] PROBLEM - HTTP availability for Nginx -SSL terminators- at ulsfo on einsteinium is CRITICAL: cluster=cache_upload site=ulsfo https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=4&fullscreen&refresh=1m&orgId=1 [09:54:27] Amir1: ah, it's on a private list, but I think the instructions are not private, I'll ping you [09:54:40] PROBLEM - HTTP availability for Varnish at ulsfo on einsteinium is CRITICAL: job=varnish-upload site=ulsfo https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=3&fullscreen&refresh=1m&orgId=1 [09:54:43] thanks! [09:55:50] PROBLEM - puppet last run on db1102 is CRITICAL: CRITICAL: Puppet has 1 failures. Last run 5 minutes ago with 1 failures. Failed resources (up to 3 shown): Exec[pt-heartbeat-kill] [09:56:01] RECOVERY - HTTP availability for Nginx -SSL terminators- at ulsfo on einsteinium is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=4&fullscreen&refresh=1m&orgId=1 [09:57:00] RECOVERY - HTTP availability for Varnish at ulsfo on einsteinium is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=3&fullscreen&refresh=1m&orgId=1 [09:57:52] ulsfo is depooled since yesterday because of outages on the links ^ [09:58:39] ema: ahh okok so the nginx availability drops were network links not working? [10:04:03] elukey: I think so, yes [10:21:30] RECOVERY - puppet last run on db1102 is OK: OK: Puppet is currently enabled, last run 11 seconds ago with 0 failures [10:35:22] 10Operations, 10Performance-Team, 10Traffic, 10Wikimedia-General-or-Unknown: Search engines continue to link to JS-redirect destination after Wikipedia copyright protest - https://phabricator.wikimedia.org/T199252 (10Deskana) @Imarlier I don't have access to Google Webmaster Tools any more, nor was I parti... [10:59:32] 10Operations, 10DNS, 10Release-Engineering-Team, 10Traffic, and 4 others: Move Foundation Wiki to new URL when new Wikimedia Foundation website launches - https://phabricator.wikimedia.org/T188776 (10JAllemandou) [11:27:14] (03PS1) 10Ema: libvmod-netmapper (1.7-2) stretch-wikimedia; urgency=medium [software/varnish/libvmod-netmapper] (debian) - 10https://gerrit.wikimedia.org/r/448470 (https://phabricator.wikimedia.org/T200445) [11:31:05] !log mobrovac@deploy1001 Started deploy [eventstreams/deploy@941e3cf]: Make the main processing loop async - T199813 [11:31:09] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:31:10] T199813: EventStreams accumulates too much memory on SCB nodes in CODFW - https://phabricator.wikimedia.org/T199813 [11:33:07] !log mobrovac@deploy1001 Finished deploy [eventstreams/deploy@941e3cf]: Make the main processing loop async - T199813 (duration: 02m 02s) [11:33:11] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [11:37:39] (03PS1) 10Jcrespo: mariadb: Allow reimage of db1092 and db1094 [puppet] - 10https://gerrit.wikimedia.org/r/448474 [11:39:11] (03PS2) 10Jcrespo: mariadb: Allow reimage of db1092 and db1094 [puppet] - 10https://gerrit.wikimedia.org/r/448474 [11:44:30] 10Operations, 10ops-codfw, 10Analytics, 10Analytics-Kanban, and 6 others: EventStreams accumulates too much memory on SCB nodes in CODFW - https://phabricator.wikimedia.org/T199813 (10mobrovac) [11:45:13] (03PS1) 10Jcrespo: mariadb: Depool db1094 for reimage [mediawiki-config] - 10https://gerrit.wikimedia.org/r/448476 [12:03:48] (03PS1) 10Rduran: Add more test cases for CuminExecution asn transferer [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/448488 [12:10:17] (03PS10) 10Jcrespo: Remove $::mw_primary variable from puppet [puppet] - 10https://gerrit.wikimedia.org/r/345346 (https://phabricator.wikimedia.org/T156924) [12:10:36] (03CR) 10Alexandros Kosiaris: [C: 031] "Per the XXX comment this probably requires some more testing, but +1 on premise" [puppet] - 10https://gerrit.wikimedia.org/r/447804 (owner: 10BBlack) [12:13:13] (03CR) 10Alexandros Kosiaris: [C: 04-1] tmpfs privkeys [3/3]: use for tlsproxy (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/447805 (owner: 10BBlack) [12:20:51] (03CR) 10Jcrespo: Add more test cases for CuminExecution asn transferer (031 comment) [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/448488 (owner: 10Rduran) [12:22:57] (03CR) 10Rduran: ">" (031 comment) [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/448488 (owner: 10Rduran) [12:24:27] (03CR) 10Jcrespo: "> >" (031 comment) [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/448488 (owner: 10Rduran) [12:31:41] (03PS1) 10Mark Bergsma: Remove Travis CI build environment dist setting [debs/pybal] - 10https://gerrit.wikimedia.org/r/448498 [12:34:12] (03PS2) 10Rduran: Add more test cases for CuminExecution asn transferer [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/448488 [12:34:14] (03CR) 10Mark Bergsma: [C: 032] Remove Travis CI build environment dist setting [debs/pybal] - 10https://gerrit.wikimedia.org/r/448498 (owner: 10Mark Bergsma) [12:36:45] (03PS1) 10Rduran: Return an error when the sizes are different [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/448501 [12:37:29] (03PS2) 10Mark Bergsma: Remove Travis CI build environment dist setting [debs/pybal] - 10https://gerrit.wikimedia.org/r/448498 [12:39:22] (03PS3) 10Mark Bergsma: Remove Travis CI build environment dist setting [debs/pybal] - 10https://gerrit.wikimedia.org/r/448498 [12:41:21] (03PS1) 10Jcrespo: monitoring: Harmonize check naming to a common set of rules [puppet] - 10https://gerrit.wikimedia.org/r/448503 [12:43:04] (03PS2) 10Jcrespo: monitoring: Harmonize check naming to a common set of rules [puppet] - 10https://gerrit.wikimedia.org/r/448503 [12:44:21] (03PS3) 10Jcrespo: monitoring: Harmonize check naming to a common set of rules [puppet] - 10https://gerrit.wikimedia.org/r/448503 [12:46:41] (03CR) 10Jcrespo: "I would even go beyond these rules and "ban" Status/state, too (all checks are for a state)." [puppet] - 10https://gerrit.wikimedia.org/r/448503 (owner: 10Jcrespo) [12:46:55] (03PS13) 10Vgutierrez: WIP: provide ACMEv2 support based on certbot/acme library [software/certcentral] - 10https://gerrit.wikimedia.org/r/446618 (https://phabricator.wikimedia.org/T199717) [12:47:38] (03CR) 10jerkins-bot: [V: 04-1] WIP: provide ACMEv2 support based on certbot/acme library [software/certcentral] - 10https://gerrit.wikimedia.org/r/446618 (https://phabricator.wikimedia.org/T199717) (owner: 10Vgutierrez) [12:48:13] (03Abandoned) 10Vgutierrez: Provide a valid pebble config for the integration tests [software/certcentral] - 10https://gerrit.wikimedia.org/r/448020 (https://phabricator.wikimedia.org/T200405) (owner: 10Vgutierrez) [12:48:40] (03CR) 10Marostegui: [C: 031] mariadb: Depool db1094 for reimage [mediawiki-config] - 10https://gerrit.wikimedia.org/r/448476 (owner: 10Jcrespo) [12:49:05] (03CR) 10Jcrespo: "Thanks!" [software/wmfmariadbpy] - 10https://gerrit.wikimedia.org/r/448488 (owner: 10Rduran) [12:49:18] (03CR) 10Ema: [C: 032] "recheck" [software/varnish/vhtcpd] (debian) - 10https://gerrit.wikimedia.org/r/448424 (https://phabricator.wikimedia.org/T200445) (owner: 10Ema) [12:53:07] (03PS2) 10Ema: libvmod-netmapper (1.7-2) stretch-wikimedia; urgency=medium [software/varnish/libvmod-netmapper] (debian) - 10https://gerrit.wikimedia.org/r/448470 (https://phabricator.wikimedia.org/T200445) [12:54:25] (03PS1) 10Aklapper: Phab: Allow aklapper to delete personal Herald filter rules [puppet] - 10https://gerrit.wikimedia.org/r/448505 [12:55:22] (03CR) 10Ema: "recheck" [software/varnish/libvmod-netmapper] (debian) - 10https://gerrit.wikimedia.org/r/448470 (https://phabricator.wikimedia.org/T200445) (owner: 10Ema) [12:56:29] (03PS1) 10BBlack: cp/lvs: add bnxt_en support in NIC tuning stuff [puppet] - 10https://gerrit.wikimedia.org/r/448506 (https://phabricator.wikimedia.org/T195923) [12:56:33] (03PS1) 10BBlack: cp1075-90: numa_networking: on [puppet] - 10https://gerrit.wikimedia.org/r/448507 (https://phabricator.wikimedia.org/T195923) [12:56:37] (03PS1) 10BBlack: cp1075-99: define in site.pp [puppet] - 10https://gerrit.wikimedia.org/r/448508 (https://phabricator.wikimedia.org/T195923) [12:57:08] (03CR) 10jerkins-bot: [V: 04-1] cp/lvs: add bnxt_en support in NIC tuning stuff [puppet] - 10https://gerrit.wikimedia.org/r/448506 (https://phabricator.wikimedia.org/T195923) (owner: 10BBlack) [12:57:13] (03PS2) 10BBlack: cp1075-90: define in site.pp [puppet] - 10https://gerrit.wikimedia.org/r/448508 (https://phabricator.wikimedia.org/T195923) [12:57:22] (03CR) 10Jcrespo: [C: 032] mariadb: Allow reimage of db1092 and db1094 [puppet] - 10https://gerrit.wikimedia.org/r/448474 (owner: 10Jcrespo) [12:58:47] (03CR) 10Vgutierrez: "At this point integration tests should pass locally iff pebble is in your PATH. On our CI environment they should pass when https://gerrit" [software/certcentral] - 10https://gerrit.wikimedia.org/r/446618 (https://phabricator.wikimedia.org/T199717) (owner: 10Vgutierrez) [12:59:49] (03PS2) 10BBlack: cp/lvs: add bnxt_en support in NIC tuning stuff [puppet] - 10https://gerrit.wikimedia.org/r/448506 (https://phabricator.wikimedia.org/T195923) [12:59:51] (03PS2) 10BBlack: cp1075-90: numa_networking: on [puppet] - 10https://gerrit.wikimedia.org/r/448507 (https://phabricator.wikimedia.org/T195923) [12:59:53] (03PS3) 10BBlack: cp1075-90: define in site.pp [puppet] - 10https://gerrit.wikimedia.org/r/448508 (https://phabricator.wikimedia.org/T195923) [13:00:11] (03PS3) 10Ema: libvmod-netmapper (1.7-2) stretch-wikimedia; urgency=medium [software/varnish/libvmod-netmapper] (debian) - 10https://gerrit.wikimedia.org/r/448470 (https://phabricator.wikimedia.org/T200445) [13:00:48] (03PS1) 10Mark Bergsma: Remove installation of tox in .travis.yml [debs/pybal] - 10https://gerrit.wikimedia.org/r/448513 [13:02:03] (03PS2) 10Mark Bergsma: Remove installation of tox in .travis.yml [debs/pybal] - 10https://gerrit.wikimedia.org/r/448513 [13:03:22] (03CR) 10Mark Bergsma: [C: 032] Remove installation of tox in .travis.yml [debs/pybal] - 10https://gerrit.wikimedia.org/r/448513 (owner: 10Mark Bergsma) [13:03:25] (03CR) 10Ema: [C: 032] libvmod-netmapper (1.7-2) stretch-wikimedia; urgency=medium [software/varnish/libvmod-netmapper] (debian) - 10https://gerrit.wikimedia.org/r/448470 (https://phabricator.wikimedia.org/T200445) (owner: 10Ema) [13:03:57] (03Merged) 10jenkins-bot: Remove installation of tox in .travis.yml [debs/pybal] - 10https://gerrit.wikimedia.org/r/448513 (owner: 10Mark Bergsma) [13:06:50] (03CR) 10Jcrespo: [C: 032] mariadb: Depool db1094 for reimage [mediawiki-config] - 10https://gerrit.wikimedia.org/r/448476 (owner: 10Jcrespo) [13:08:30] (03Merged) 10jenkins-bot: mariadb: Depool db1094 for reimage [mediawiki-config] - 10https://gerrit.wikimedia.org/r/448476 (owner: 10Jcrespo) [13:09:54] !log jynus@deploy1001 Synchronized wmf-config/db-eqiad.php: Depool db1094 for reimage (duration: 00m 54s) [13:09:56] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:10:50] PROBLEM - HTTP availability for Nginx -SSL terminators- at ulsfo on einsteinium is CRITICAL: cluster=cache_upload site=ulsfo https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=4&fullscreen&refresh=1m&orgId=1 [13:13:10] RECOVERY - HTTP availability for Nginx -SSL terminators- at ulsfo on einsteinium is OK: All metrics within thresholds. https://grafana.wikimedia.org/dashboard/db/frontend-traffic?panelId=4&fullscreen&refresh=1m&orgId=1 [13:13:32] uh, that sounds fun [13:15:02] jynus: that's ulsfo, depooled since yesterday because of network issues [13:15:10] oh, I didn't know about that [13:15:12] thank you [13:15:21] I was looking at traffic and it was low [13:15:28] very :) [13:15:29] but I thought it was only becaus of the time [13:15:41] (03CR) 10Filippo Giunchedi: "Agreed with the rationale, there's likely other checks to change too" (031 comment) [puppet] - 10https://gerrit.wikimedia.org/r/448503 (owner: 10Jcrespo) [13:15:57] !log libvmod-netmapper 1.7-2 uploaded to stretch-wikimedia T200445 [13:16:00] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:16:01] T200445: Upgrade cache servers to stretch - https://phabricator.wikimedia.org/T200445 [13:16:18] I still see 4K 200 per minute! [13:16:44] (03CR) 10jenkins-bot: mariadb: Depool db1094 for reimage [mediawiki-config] - 10https://gerrit.wikimedia.org/r/448476 (owner: 10Jcrespo) [13:17:00] oh, I was misslead by last week metrics [13:17:21] there is only 64 [13:17:26] :-/ [13:18:26] I'll silence those until monday [13:19:13] (03PS1) 10Ema: libvmod-re2 (1.3.1-2) stretch-wikimedia; urgency=low [software/varnish/libvmod-re2] (debian) - 10https://gerrit.wikimedia.org/r/448517 (https://phabricator.wikimedia.org/T200445) [13:24:11] (03CR) 10Ema: [C: 032] libvmod-re2 (1.3.1-2) stretch-wikimedia; urgency=low [software/varnish/libvmod-re2] (debian) - 10https://gerrit.wikimedia.org/r/448517 (https://phabricator.wikimedia.org/T200445) (owner: 10Ema) [13:27:15] 10Operations, 10Traffic, 10Patch-For-Review: Upgrade cache servers to stretch - https://phabricator.wikimedia.org/T200445 (10ema) [13:27:53] !log libvmod-re2 1.3.1-2 uploaded to stretch-wikimedia T200445 [13:27:57] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [13:27:58] T200445: Upgrade cache servers to stretch - https://phabricator.wikimedia.org/T200445 [13:37:30] 10Operations, 10Performance-Team, 10Traffic, 10Wikimedia-General-or-Unknown: Search engines continue to link to JS-redirect destination after Wikipedia copyright protest - https://phabricator.wikimedia.org/T199252 (10Imarlier) >>! In T199252#4454769, @dr0ptp4kt wrote: > What's needed? @dr0ptp4kt within Se... [13:41:32] (03PS1) 10Ema: varnish: override default cc_command [puppet] - 10https://gerrit.wikimedia.org/r/448525 (https://phabricator.wikimedia.org/T200445) [13:45:17] 10Operations, 10Traffic, 10Patch-For-Review: Upgrade cache servers to stretch - https://phabricator.wikimedia.org/T200445 (10ema) [13:48:04] (03PS4) 10BBlack: cp1075-90: define in site.pp [puppet] - 10https://gerrit.wikimedia.org/r/448508 (https://phabricator.wikimedia.org/T195923) [13:48:06] (03PS1) 10BBlack: cp1075-90: cache storage parameterization [puppet] - 10https://gerrit.wikimedia.org/r/448530 (https://phabricator.wikimedia.org/T195923) [13:59:37] 10Operations, 10Traffic, 10Patch-For-Review: Upgrade cache servers to stretch - https://phabricator.wikimedia.org/T200445 (10ema) [14:00:01] PROBLEM - IPv6 ping to codfw on ripe-atlas-codfw IPv6 is CRITICAL: CRITICAL - failed 26 probes of 309 (alerts on 19) - https://atlas.ripe.net/measurements/1791212/#!map [14:05:11] RECOVERY - IPv6 ping to codfw on ripe-atlas-codfw IPv6 is OK: OK - failed 12 probes of 309 (alerts on 19) - https://atlas.ripe.net/measurements/1791212/#!map [14:10:04] 10Operations, 10Traffic, 10Patch-For-Review: Upgrade cache servers to stretch - https://phabricator.wikimedia.org/T200445 (10ema) All dependencies are now in stretch-wikimedia. libvmod-netmapper and libvmod-re2 needed to be rebuilt because of updated dependencies. I've released a new debian version of vhtcpd... [14:17:58] (03PS2) 10Ema: varnish: drop -Werror from cc_command [puppet] - 10https://gerrit.wikimedia.org/r/448525 (https://phabricator.wikimedia.org/T200445) [14:18:56] !log execute echo 'https://wikimania.wikimedia.org' | mwscript purgeList.php on mwmain1001 [14:18:59] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:23:19] !log stop and reimage db1094 [14:23:22] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:24:09] (03PS1) 10Jcrespo: Revert "mariadb: Depool db1094 for reimage" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/448532 [14:25:01] (03PS1) 10Andrew Bogott: add ipv6 to nova controller nodes [puppet] - 10https://gerrit.wikimedia.org/r/448533 [14:25:03] (03PS1) 10Andrew Bogott: Update nova controller ferm rules for ipv6 [puppet] - 10https://gerrit.wikimedia.org/r/448534 [14:25:33] (03PS1) 10Jcrespo: mariadb: Repool db1094 with low load after reimage [mediawiki-config] - 10https://gerrit.wikimedia.org/r/448535 [14:26:43] (03PS6) 10MSantos: Set up cron task to regen low-zoom vector tiles [puppet] - 10https://gerrit.wikimedia.org/r/447851 (https://phabricator.wikimedia.org/T194787) [14:28:01] (03CR) 10MSantos: Set up cron task to regen low-zoom vector tiles (032 comments) [puppet] - 10https://gerrit.wikimedia.org/r/447851 (https://phabricator.wikimedia.org/T194787) (owner: 10MSantos) [14:29:32] (03CR) 10Andrew Bogott: [C: 032] add ipv6 to nova controller nodes [puppet] - 10https://gerrit.wikimedia.org/r/448533 (owner: 10Andrew Bogott) [14:34:25] (03PS1) 10Andrew Bogott: Added ipv6 entries for labcontrol1001/1002 [dns] - 10https://gerrit.wikimedia.org/r/448540 [14:34:28] (03PS1) 10Mark Bergsma: Simplify dependency handling in tox.ini [debs/pybal] - 10https://gerrit.wikimedia.org/r/448541 [14:34:30] (03PS1) 10Mark Bergsma: Run all tox testing using 'coverage' [debs/pybal] - 10https://gerrit.wikimedia.org/r/448542 [14:34:32] (03PS1) 10Mark Bergsma: Use tox for running tests in travis [debs/pybal] - 10https://gerrit.wikimedia.org/r/448543 [14:36:36] !log terbium - removing accountcheck cron job (duplicate mail from the same thing now on mwmaint1001) [14:36:43] (03PS3) 10BBlack: cp/lvs: add bnxt_en support in NIC tuning stuff [puppet] - 10https://gerrit.wikimedia.org/r/448506 (https://phabricator.wikimedia.org/T195923) [14:36:45] (03PS3) 10BBlack: cp1075-90: numa_networking: on [puppet] - 10https://gerrit.wikimedia.org/r/448507 (https://phabricator.wikimedia.org/T195923) [14:36:47] (03PS2) 10BBlack: cp1075-90: cache storage parameterization [puppet] - 10https://gerrit.wikimedia.org/r/448530 (https://phabricator.wikimedia.org/T195923) [14:36:49] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [14:36:50] (03PS5) 10BBlack: cp1075-90: define in site.pp [puppet] - 10https://gerrit.wikimedia.org/r/448508 (https://phabricator.wikimedia.org/T195923) [14:42:57] (03PS2) 10Andrew Bogott: Added ipv6 entries for labcontrol1001/1002 [dns] - 10https://gerrit.wikimedia.org/r/448540 [14:42:59] (03PS1) 10Andrew Bogott: added ipv6 entries for cloudcontrol1003, 1004 [dns] - 10https://gerrit.wikimedia.org/r/448544 [14:43:26] (03PS4) 10BBlack: cp/lvs: add bnxt_en support in NIC tuning stuff [puppet] - 10https://gerrit.wikimedia.org/r/448506 (https://phabricator.wikimedia.org/T195923) [14:43:28] (03PS4) 10BBlack: cp1075-90: numa_networking: on [puppet] - 10https://gerrit.wikimedia.org/r/448507 (https://phabricator.wikimedia.org/T195923) [14:43:30] (03PS3) 10BBlack: cp1075-90: cache storage parameterization [puppet] - 10https://gerrit.wikimedia.org/r/448530 (https://phabricator.wikimedia.org/T195923) [14:43:32] (03PS6) 10BBlack: cp1075-90: define in site.pp [puppet] - 10https://gerrit.wikimedia.org/r/448508 (https://phabricator.wikimedia.org/T195923) [14:43:34] (03CR) 10Andrew Bogott: [V: 032 C: 032] Added ipv6 entries for labcontrol1001/1002 [dns] - 10https://gerrit.wikimedia.org/r/448540 (owner: 10Andrew Bogott) [14:44:29] (03CR) 10Andrew Bogott: [C: 032] Update nova controller ferm rules for ipv6 [puppet] - 10https://gerrit.wikimedia.org/r/448534 (owner: 10Andrew Bogott) [14:45:07] (03PS1) 10Dzahn: terbium: disable cross-validate-accounts cron job spam [puppet] - 10https://gerrit.wikimedia.org/r/448545 (https://phabricator.wikimedia.org/T192092) [14:46:24] (03CR) 10Dzahn: "typo, cloud with 2 d's?" (031 comment) [dns] - 10https://gerrit.wikimedia.org/r/448544 (owner: 10Andrew Bogott) [14:48:06] (03PS2) 10Dzahn: terbium: disable cross-validate-accounts cron job spam [puppet] - 10https://gerrit.wikimedia.org/r/448545 (https://phabricator.wikimedia.org/T192092) [14:49:05] (03CR) 10Dzahn: [C: 032] terbium: disable cross-validate-accounts cron job spam [puppet] - 10https://gerrit.wikimedia.org/r/448545 (https://phabricator.wikimedia.org/T192092) (owner: 10Dzahn) [14:49:05] (03PS2) 10Andrew Bogott: added ipv6 entries for cloudcontrol1003, 1004 [dns] - 10https://gerrit.wikimedia.org/r/448544 [14:49:23] (03CR) 10Mark Bergsma: [C: 032] Simplify dependency handling in tox.ini [debs/pybal] - 10https://gerrit.wikimedia.org/r/448541 (owner: 10Mark Bergsma) [14:49:30] Hey Ops, is the train resuming? I'm asking to close next week Tech News. [14:50:04] (03Merged) 10jenkins-bot: Simplify dependency handling in tox.ini [debs/pybal] - 10https://gerrit.wikimedia.org/r/448541 (owner: 10Mark Bergsma) [14:52:08] (03CR) 10Dzahn: [C: 031] added ipv6 entries for cloudcontrol1003, 1004 [dns] - 10https://gerrit.wikimedia.org/r/448544 (owner: 10Andrew Bogott) [14:52:18] jouncebot: next [14:52:20] In 68 hour(s) and 7 minute(s): Wikimedia Portals Update (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180730T1100) [14:52:20] In 68 hour(s) and 7 minute(s): European Mid-day SWAT(Max 6 patches) (https://wikitech.wikimedia.org/wiki/Deployments#deploycal-item-20180730T1100) [14:53:00] Trizek: probably train related questions should go to release engineering, try #wikimedia-releng if nobody answers here [14:53:22] I... Oh, indeed wrong channel. Thanks jynus [14:53:22] (03CR) 10Mark Bergsma: [C: 032] Run all tox testing using 'coverage' [debs/pybal] - 10https://gerrit.wikimedia.org/r/448542 (owner: 10Mark Bergsma) [14:53:40] Trizek: actually the channel is right, just it may be lost here [14:53:43] Trizek: the latest update i see is "the train is still blocked." 20 hours ago on an email to engineering [14:53:50] (03CR) 10Mark Bergsma: [C: 032] Use tox for running tests in travis [debs/pybal] - 10https://gerrit.wikimedia.org/r/448543 (owner: 10Mark Bergsma) [14:54:01] (03Merged) 10jenkins-bot: Run all tox testing using 'coverage' [debs/pybal] - 10https://gerrit.wikimedia.org/r/448542 (owner: 10Mark Bergsma) [14:54:02] mutante: did you saw my terbium-related patch? [14:54:07] *see [14:54:10] mutante: I have the same one, but the blocking task has been closed as resolved since. [14:54:25] Trizek: to be sure ask greg-g [14:54:31] (03PS1) 10Andrew Bogott: Added ipv6 entries for labtestcontrol2001 and 2003 [dns] - 10https://gerrit.wikimedia.org/r/448550 [14:54:45] jynus: i saw your comment but didn't get to the code yet:) but it sounded nice that it would unblock terbium decom :) [14:54:48] (03Merged) 10jenkins-bot: Use tox for running tests in travis [debs/pybal] - 10https://gerrit.wikimedia.org/r/448543 (owner: 10Mark Bergsma) [14:54:58] (03CR) 10Andrew Bogott: [C: 032] added ipv6 entries for cloudcontrol1003, 1004 [dns] - 10https://gerrit.wikimedia.org/r/448544 (owner: 10Andrew Bogott) [14:55:08] mutante: that is why I hoped it woudl be interesting for you [14:55:55] it is, i have a tab open now :) [14:56:14] no rush however [14:56:59] it is definitely right that we don't want to "include ::apache" things anymore and instead use httpd class in the role [14:57:37] httpd::site and the package names for stretch.. ack.. lgtm [14:57:43] we can put in a ball all code that will eventually die and put it in dbmonitor [14:58:15] but that will give us maybe 1 year to properly replace it [14:58:49] (03PS5) 10BBlack: cp/lvs: add bnxt_en support in NIC tuning stuff [puppet] - 10https://gerrit.wikimedia.org/r/448506 (https://phabricator.wikimedia.org/T195923) [14:58:51] (03PS5) 10BBlack: cp1075-90: numa_networking: on [puppet] - 10https://gerrit.wikimedia.org/r/448507 (https://phabricator.wikimedia.org/T195923) [14:58:53] (03PS4) 10BBlack: cp1075-90: cache storage parameterization [puppet] - 10https://gerrit.wikimedia.org/r/448530 (https://phabricator.wikimedia.org/T195923) [14:58:55] (03PS7) 10BBlack: cp1075-90: define in site.pp [puppet] - 10https://gerrit.wikimedia.org/r/448508 (https://phabricator.wikimedia.org/T195923) [14:58:57] (03PS3) 10Ema: varnish: drop -Werror from cc_command [puppet] - 10https://gerrit.wikimedia.org/r/448525 (https://phabricator.wikimedia.org/T200445) [14:59:06] i'm surprsied about the relation to noc module.. looking at that now [14:59:17] RECOVERY - Router interfaces on cr1-codfw is OK: OK: host 208.80.153.192, interfaces up: 126, down: 0, dormant: 0, excluded: 0, unused: 0 [14:59:17] RECOVERY - Router interfaces on cr2-ulsfo is OK: OK: host 198.35.26.193, interfaces up: 77, down: 0, dormant: 0, excluded: 0, unused: 0 [14:59:47] PROBLEM - Check systemd state on cloudcontrol1004 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [14:59:47] yea, i have seen that before. you should not have to worry about that jenkins downvote just because you touch that file.. before noc itself is converted [14:59:48] (03CR) 10Jcrespo: [C: 032] mariadb: Repool db1094 with low load after reimage [mediawiki-config] - 10https://gerrit.wikimedia.org/r/448535 (owner: 10Jcrespo) [15:00:13] i totally agree it just has lint-ignore and should be fixed separately [15:00:52] I am almost sure there is no reason to be on the same host [15:01:05] I prefer all untrusted parts being on a single VM [15:01:20] (03Merged) 10jenkins-bot: mariadb: Repool db1094 with low load after reimage [mediawiki-config] - 10https://gerrit.wikimedia.org/r/448535 (owner: 10Jcrespo) [15:01:26] I think it shares codebase aside from backend with tendril [15:01:38] (03PS1) 10Amire80: Enable SandboxLink on the isiZulu Wikipedia [mediawiki-config] - 10https://gerrit.wikimedia.org/r/448553 (https://phabricator.wikimedia.org/T200522) [15:01:40] (03CR) 10jenkins-bot: mariadb: Repool db1094 with low load after reimage [mediawiki-config] - 10https://gerrit.wikimedia.org/r/448535 (owner: 10Jcrespo) [15:01:42] BUT, maybe it has some hidden dependency with mediawi-repo? [15:03:08] i also think there is no reason to be on the same host. i am compiling that now [15:03:58] we could do it a bit more refined (role/profile), but as I said, I want to do the minimum before killing it [15:04:48] (03PS4) 10Ema: varnish: drop -Werror from cc_command [puppet] - 10https://gerrit.wikimedia.org/r/448525 (https://phabricator.wikimedia.org/T200445) [15:05:37] (03PS6) 10BBlack: cp/lvs: add bnxt_en support in NIC tuning stuff [puppet] - 10https://gerrit.wikimedia.org/r/448506 (https://phabricator.wikimedia.org/T195923) [15:05:39] (03PS6) 10BBlack: cp1075-90: numa_networking: on [puppet] - 10https://gerrit.wikimedia.org/r/448507 (https://phabricator.wikimedia.org/T195923) [15:05:41] (03PS5) 10BBlack: cp1075-90: cache storage parameterization [puppet] - 10https://gerrit.wikimedia.org/r/448530 (https://phabricator.wikimedia.org/T195923) [15:05:43] (03PS8) 10BBlack: cp1075-90: define in site.pp [puppet] - 10https://gerrit.wikimedia.org/r/448508 (https://phabricator.wikimedia.org/T195923) [15:06:08] PROBLEM - Check systemd state on labtestcontrol2003 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [15:06:51] !log jynus@deploy1001 Synchronized wmf-config/db-eqiad.php: Repool db1094 with low load (duration: 00m 55s) [15:06:53] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:08:09] (03PS2) 10Jcrespo: Revert "mariadb: Depool db1094 for reimage" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/448532 [15:08:25] (03PS7) 10BBlack: cp/lvs: add bnxt_en support in NIC tuning stuff [puppet] - 10https://gerrit.wikimedia.org/r/448506 (https://phabricator.wikimedia.org/T195923) [15:08:27] (03PS7) 10BBlack: cp1075-90: numa_networking: on [puppet] - 10https://gerrit.wikimedia.org/r/448507 (https://phabricator.wikimedia.org/T195923) [15:08:29] (03PS6) 10BBlack: cp1075-90: cache storage parameterization [puppet] - 10https://gerrit.wikimedia.org/r/448530 (https://phabricator.wikimedia.org/T195923) [15:08:31] (03PS9) 10BBlack: cp1075-90: define in site.pp [puppet] - 10https://gerrit.wikimedia.org/r/448508 (https://phabricator.wikimedia.org/T195923) [15:08:32] bah [15:09:55] jynus: i was about to ask where does the "httpd" class get included so that tendril already has apache.. but i saw now it's in tendril module. so that looks all good to me it also compiles fine [15:10:16] (03PS1) 10Mark Bergsma: Do not import the test sub package from the main pybal package [debs/pybal] - 10https://gerrit.wikimedia.org/r/448557 [15:10:40] (03CR) 10Dzahn: [C: 031] "looks all good to me. yes we should just lint-ignore the apache class includes in the noc module and fix that separately. it's great that " [puppet] - 10https://gerrit.wikimedia.org/r/445597 (https://phabricator.wikimedia.org/T192092) (owner: 10Jcrespo) [15:11:05] (03CR) 10Andrew Bogott: [C: 032] Added ipv6 entries for labtestcontrol2001 and 2003 [dns] - 10https://gerrit.wikimedia.org/r/448550 (owner: 10Andrew Bogott) [15:11:30] mutante: I know the right way to do it would be creating a common profile [15:11:55] but the comment and the fact that tecnically dbtree is tendril, I hope we can make an exception [15:12:13] again, thinking this will be mostly deprecated soon [15:12:16] yea, it's fine. you are just being pragmatic [15:12:30] (03CR) 10Mark Bergsma: [C: 032] Do not import the test sub package from the main pybal package [debs/pybal] - 10https://gerrit.wikimedia.org/r/448557 (owner: 10Mark Bergsma) [15:12:39] and if it wasn't for some reason, we would refactor on a separate patch [15:13:09] of course, yea, agree with all that [15:13:11] (03Merged) 10jenkins-bot: Do not import the test sub package from the main pybal package [debs/pybal] - 10https://gerrit.wikimedia.org/r/448557 (owner: 10Mark Bergsma) [15:14:20] (03PS2) 10Dzahn: tendril: use ::profile::base::firewall [puppet] - 10https://gerrit.wikimedia.org/r/447942 [15:14:32] also i have this ^ ,, it will do nothing :) [15:15:31] that is ok, and you can merge that now [15:15:56] k, thanks [15:15:57] (03CR) 10Dzahn: [C: 032] tendril: use ::profile::base::firewall [puppet] - 10https://gerrit.wikimedia.org/r/447942 (owner: 10Dzahn) [15:15:58] I was going to wait until monday so that others could comment [15:16:18] plus me being able to deploy on a non-friday evening [15:16:36] I will need to change also the pointer at the misc configuration [15:16:37] right, it's Friday, true [15:16:48] also for others to reevaluate the reviews [15:17:12] there is one thing that changes here.. the backend now is in wikimedia.org not eqiad.wmnet [15:17:26] yes [15:17:40] do you see any problem with just changing the backend pointer? [15:18:06] no, i don't. that should be fine, we do that with others. maybe just have to double check the ferm rules [15:18:16] that they allow cache-misc to talk to it [15:18:22] mmm [15:18:40] ok, will do that- in any case, a problem for the followup patch :-) [15:18:52] and this just changes the backed [15:19:28] that will do https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/445596/ [15:19:35] but with dbmonitor [15:20:31] !log joal@deploy1001 Started deploy [analytics/aqs/deploy@6fafc63]: Update AQS new-pages metric [15:20:33] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:20:34] sadly, dbtee cannot work even in read only on codfw because how tendril works [15:21:41] jynus: i just saw the code slightly changed for that [15:21:47] srange => '$CACHES', [15:22:23] this is what is used now in ferm rules to allow from cp nodes [15:23:11] ex. modules/profile/manifests/racktables.pp [15:24:43] !log joal@deploy1001 Finished deploy [analytics/aqs/deploy@6fafc63]: Update AQS new-pages metric (duration: 04m 12s) [15:24:45] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [15:25:18] but role(tendril) already allows http/https from any it looks, so should just work [15:25:54] i dont think you have to do anything there besides switch the backend.. yep [15:26:05] (03CR) 10BBlack: [C: 032] cp/lvs: add bnxt_en support in NIC tuning stuff [puppet] - 10https://gerrit.wikimedia.org/r/448506 (https://phabricator.wikimedia.org/T195923) (owner: 10BBlack) [15:26:23] bleh rebase time! [15:26:42] (03PS8) 10BBlack: cp/lvs: add bnxt_en support in NIC tuning stuff [puppet] - 10https://gerrit.wikimedia.org/r/448506 (https://phabricator.wikimedia.org/T195923) [15:26:44] (03PS8) 10BBlack: cp1075-90: numa_networking: on [puppet] - 10https://gerrit.wikimedia.org/r/448507 (https://phabricator.wikimedia.org/T195923) [15:26:46] (03PS7) 10BBlack: cp1075-90: cache storage parameterization [puppet] - 10https://gerrit.wikimedia.org/r/448530 (https://phabricator.wikimedia.org/T195923) [15:26:48] (03PS10) 10BBlack: cp1075-90: define in site.pp [puppet] - 10https://gerrit.wikimedia.org/r/448508 (https://phabricator.wikimedia.org/T195923) [15:26:50] (03PS1) 10Elukey: role::archiva: move to profile [puppet] - 10https://gerrit.wikimedia.org/r/448569 [15:27:28] (03CR) 10BBlack: [C: 032] cp1075-90: numa_networking: on [puppet] - 10https://gerrit.wikimedia.org/r/448507 (https://phabricator.wikimedia.org/T195923) (owner: 10BBlack) [15:27:38] (03CR) 10BBlack: [C: 032] cp1075-90: cache storage parameterization [puppet] - 10https://gerrit.wikimedia.org/r/448530 (https://phabricator.wikimedia.org/T195923) (owner: 10BBlack) [15:27:40] (03CR) 10Dzahn: [C: 04-1] "yea, we certainly could.. but there are also disadvantages of adding more entries to DNS just for redirects.. i am not sure myself. not a " [dns] - 10https://gerrit.wikimedia.org/r/441817 (https://phabricator.wikimedia.org/T189637) (owner: 10MarcoAurelio) [15:27:47] (03CR) 10BBlack: [C: 032] cp1075-90: define in site.pp [puppet] - 10https://gerrit.wikimedia.org/r/448508 (https://phabricator.wikimedia.org/T195923) (owner: 10BBlack) [15:30:16] (03CR) 10Dzahn: [C: 031] "for this one, maybe you could ask Email: wikimania-scholarships@wikimedia.org if they could make a simple test after the merge. it looks" [puppet] - 10https://gerrit.wikimedia.org/r/441133 (https://phabricator.wikimedia.org/T196920) (owner: 10Herron) [15:34:22] (03CR) 10Elukey: [C: 032] role::archiva: move to profile [puppet] - 10https://gerrit.wikimedia.org/r/448569 (owner: 10Elukey) [15:34:31] (03PS14) 10Vgutierrez: WIP: provide ACMEv2 support based on certbot/acme library [software/certcentral] - 10https://gerrit.wikimedia.org/r/446618 (https://phabricator.wikimedia.org/T199717) [15:34:47] (03PS2) 10Elukey: role::archiva: move to profile [puppet] - 10https://gerrit.wikimedia.org/r/448569 [15:35:11] (03CR) 10jerkins-bot: [V: 04-1] WIP: provide ACMEv2 support based on certbot/acme library [software/certcentral] - 10https://gerrit.wikimedia.org/r/446618 (https://phabricator.wikimedia.org/T199717) (owner: 10Vgutierrez) [15:43:12] (03CR) 10Herron: "good idea! I'll try this" [puppet] - 10https://gerrit.wikimedia.org/r/441133 (https://phabricator.wikimedia.org/T196920) (owner: 10Herron) [15:45:25] (03CR) 10Herron: "> I (or anyone with an account) can use https://iegreview.wikimedia.org/account/recover" [puppet] - 10https://gerrit.wikimedia.org/r/441131 (https://phabricator.wikimedia.org/T196920) (owner: 10Herron) [15:46:49] (03CR) 10Vgutierrez: "recheck" [software/certcentral] - 10https://gerrit.wikimedia.org/r/446618 (https://phabricator.wikimedia.org/T199717) (owner: 10Vgutierrez) [15:47:32] (03CR) 10jerkins-bot: [V: 04-1] WIP: provide ACMEv2 support based on certbot/acme library [software/certcentral] - 10https://gerrit.wikimedia.org/r/446618 (https://phabricator.wikimedia.org/T199717) (owner: 10Vgutierrez) [15:47:49] sigh.. it's still using the old one :/ [15:51:40] (03CR) 10Alexandros Kosiaris: [C: 04-2] "I have to add that having both striker.wikimedia.org AND admin.toolforge.org as noted in the last comment would greatly confuse me." [dns] - 10https://gerrit.wikimedia.org/r/441817 (https://phabricator.wikimedia.org/T189637) (owner: 10MarcoAurelio) [15:52:32] (03CR) 10Hashar: "recheck" [software/certcentral] - 10https://gerrit.wikimedia.org/r/446618 (https://phabricator.wikimedia.org/T199717) (owner: 10Vgutierrez) [15:54:37] PROBLEM - High CPU load on API appserver on mw1285 is CRITICAL: CRITICAL - load average: 81.30, 33.58, 19.09 [15:55:47] RECOVERY - High CPU load on API appserver on mw1285 is OK: OK - load average: 28.68, 27.47, 18.01 [15:57:03] 10Operations, 10Traffic, 10Patch-For-Review: Pick up a suitable ACME library for certcentral - https://phabricator.wikimedia.org/T199717 (10Vgutierrez) [15:57:05] 10Operations, 10Traffic, 10Continuous-Integration-Config: Provide a CI container with pebble - https://phabricator.wikimedia.org/T200405 (10Vgutierrez) 05Open>03Resolved [15:58:52] (03CR) 10Vgutierrez: "nice,integration tests against pebble are working and pebble log can be checked in https://integration.wikimedia.org/ci/job/certcentral-to" [software/certcentral] - 10https://gerrit.wikimedia.org/r/446618 (https://phabricator.wikimedia.org/T199717) (owner: 10Vgutierrez) [16:27:47] 10Operations, 10ops-codfw, 10Analytics, 10Analytics-Kanban, and 5 others: EventStreams accumulates too much memory on SCB nodes in CODFW - https://phabricator.wikimedia.org/T199813 (10mobrovac) 05Open>03Resolved This has finally been resolved for good. Here's a summary/post-mortem for clarity and poste... [16:30:17] (03PS2) 10Dzahn: prometheus: use ::profile::base::firewall [puppet] - 10https://gerrit.wikimedia.org/r/448084 [16:31:16] (03CR) 10Dzahn: [C: 032] prometheus: use ::profile::base::firewall [puppet] - 10https://gerrit.wikimedia.org/r/448084 (owner: 10Dzahn) [16:31:38] (03Abandoned) 10Jcrespo: [WIP] Create framework to transfer files over the LAN [puppet] - 10https://gerrit.wikimedia.org/r/326155 (owner: 10Jcrespo) [16:32:24] (03CR) 10Jcrespo: [C: 04-1] "This probably needs to be abandoned." [puppet] - 10https://gerrit.wikimedia.org/r/345847 (https://phabricator.wikimedia.org/T157359) (owner: 10Jcrespo) [16:33:59] 10Operations, 10Cloud-Services, 10Cloud-VPS, 10DBA, 10Patch-For-Review: labsdb1006/1007 (postgresql) maintenance - https://phabricator.wikimedia.org/T157359 (10jcrespo) [16:35:27] (03Abandoned) 10Jcrespo: osm-labs: Fix replication credentials [puppet] - 10https://gerrit.wikimedia.org/r/345847 (https://phabricator.wikimedia.org/T157359) (owner: 10Jcrespo) [16:36:10] (03CR) 10Dzahn: [C: 032] "noop on prometheus1003/2004 like all the others" [puppet] - 10https://gerrit.wikimedia.org/r/448084 (owner: 10Dzahn) [16:37:13] (03PS3) 10Dzahn: wdqs/labs: use ::profile::base::firewall [puppet] - 10https://gerrit.wikimedia.org/r/448157 [16:39:37] (03CR) 10Dzahn: [C: 032] wdqs/labs: use ::profile::base::firewall [puppet] - 10https://gerrit.wikimedia.org/r/448157 (owner: 10Dzahn) [16:39:45] (03PS1) 10Dzahn: sca/scb: use ::profile::base::firewall [puppet] - 10https://gerrit.wikimedia.org/r/448583 [16:40:58] (03PS2) 10Dzahn: swift: use ::profile::base::firewall [puppet] - 10https://gerrit.wikimedia.org/r/448087 [16:41:17] (03PS3) 10Dzahn: swift: use ::profile::base::firewall [puppet] - 10https://gerrit.wikimedia.org/r/448087 [16:43:23] !log depooling wdqs1003 to let it catch up on update lag [16:43:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [16:50:58] (03CR) 10Dzahn: [C: 032] swift: use ::profile::base::firewall [puppet] - 10https://gerrit.wikimedia.org/r/448087 (owner: 10Dzahn) [16:56:25] (03PS1) 10Dzahn: puppetmaster: use ::profile::base::firewall [puppet] - 10https://gerrit.wikimedia.org/r/448587 [17:05:44] (03PS1) 10Gehel: wdqs: disable categories reload [puppet] - 10https://gerrit.wikimedia.org/r/448591 (https://phabricator.wikimedia.org/T200202) [17:06:14] !log repooling wdqs1003 [17:06:17] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [17:12:01] (03CR) 10Gehel: [C: 032] "ppc looks good: https://puppet-compiler.wmflabs.org/compiler02/11898/" [puppet] - 10https://gerrit.wikimedia.org/r/448591 (https://phabricator.wikimedia.org/T200202) (owner: 10Gehel) [17:12:55] (03CR) 10Smalyshev: [C: 031] wdqs: disable categories reload [puppet] - 10https://gerrit.wikimedia.org/r/448591 (https://phabricator.wikimedia.org/T200202) (owner: 10Gehel) [17:15:04] (03PS1) 10Dzahn: mediawiki::maintenance: stop including php5 packages on jessie [puppet] - 10https://gerrit.wikimedia.org/r/448594 (https://phabricator.wikimedia.org/T192092) [17:16:08] PROBLEM - puppet last run on wdqs1003 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:17:38] ^fix coming up (me being dumb again) [17:17:46] (03PS1) 10Gehel: wdqs: fix ensure of reload categories cron [puppet] - 10https://gerrit.wikimedia.org/r/448597 (https://phabricator.wikimedia.org/T200202) [17:18:44] (03CR) 10Gehel: [C: 032] wdqs: fix ensure of reload categories cron [puppet] - 10https://gerrit.wikimedia.org/r/448597 (https://phabricator.wikimedia.org/T200202) (owner: 10Gehel) [17:19:17] PROBLEM - puppet last run on wdqs2004 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:20:38] PROBLEM - puppet last run on wdqs1005 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:21:18] RECOVERY - puppet last run on wdqs1003 is OK: OK: Puppet is currently enabled, last run 1 minute ago with 0 failures [17:22:47] PROBLEM - puppet last run on wdqs2001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:25:23] (03PS1) 10Dzahn: add fake secrets for mcrouter to unbreak puppet compile on mwmaint2001 [labs/private] - 10https://gerrit.wikimedia.org/r/448599 [17:25:51] (03CR) 10Dzahn: [V: 032 C: 032] "empty files" [labs/private] - 10https://gerrit.wikimedia.org/r/448599 (owner: 10Dzahn) [17:27:57] RECOVERY - puppet last run on wdqs2001 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [17:29:56] (03CR) 10Dzahn: [C: 031] "only touches terbium - http://puppet-compiler.wmflabs.org/11900/" [puppet] - 10https://gerrit.wikimedia.org/r/448594 (https://phabricator.wikimedia.org/T192092) (owner: 10Dzahn) [17:31:16] (03CR) 10Dzahn: [C: 032] mediawiki::maintenance: stop including php5 packages on jessie [puppet] - 10https://gerrit.wikimedia.org/r/448594 (https://phabricator.wikimedia.org/T192092) (owner: 10Dzahn) [17:31:23] (03PS2) 10Dzahn: mediawiki::maintenance: stop including php5 packages on jessie [puppet] - 10https://gerrit.wikimedia.org/r/448594 (https://phabricator.wikimedia.org/T192092) [17:38:20] (03CR) 10Dzahn: [C: 032] "it won't affect dbtree 100% because i'm not even removing the installed packages, just cleaning up code that would affect new instances on" [puppet] - 10https://gerrit.wikimedia.org/r/448594 (https://phabricator.wikimedia.org/T192092) (owner: 10Dzahn) [17:45:47] (03PS1) 10Dzahn: mw::maint: allow home dir rsync between mwmaint, not terbium [puppet] - 10https://gerrit.wikimedia.org/r/448608 (https://phabricator.wikimedia.org/T192092) [17:46:50] (03PS2) 10Dzahn: mw::maint: allow home dir rsync between mwmaint, not terbium [puppet] - 10https://gerrit.wikimedia.org/r/448608 (https://phabricator.wikimedia.org/T192092) [17:48:05] (03PS3) 10Dzahn: mw::maint: allow home dir rsync between mwmaint, not terbium [puppet] - 10https://gerrit.wikimedia.org/r/448608 (https://phabricator.wikimedia.org/T192092) [17:50:07] RECOVERY - puppet last run on wdqs2004 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [17:51:18] RECOVERY - puppet last run on wdqs1005 is OK: OK: Puppet is currently enabled, last run 3 minutes ago with 0 failures [18:09:19] (03CR) 10Dzahn: [C: 032] mw::maint: allow home dir rsync between mwmaint, not terbium [puppet] - 10https://gerrit.wikimedia.org/r/448608 (https://phabricator.wikimedia.org/T192092) (owner: 10Dzahn) [18:12:10] !log terbium: deleting rsyncd config fragment, stopping rsyncd, not used anymore, decom prep [18:12:13] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:15:47] !log syncing home dirs from mwmaint1001 to mwmaint2001 (once, manually, not currently set to auto-sync, but to keep another copy of former terbium homes in case we failover) (T192092) [18:15:50] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:15:51] T192092: setup replacements for maintenance_server (terbium, wasat) on Stretch - https://phabricator.wikimedia.org/T192092 [18:18:12] (03CR) 10Dzahn: [C: 032] "14:12 < mutante> !log terbium: deleting rsyncd config fragment, stopping rsyncd, not used anymore, decom prep" [puppet] - 10https://gerrit.wikimedia.org/r/448608 (https://phabricator.wikimedia.org/T192092) (owner: 10Dzahn) [18:22:54] (03PS1) 10Dzahn: scap: remove terbium from dsh groups [puppet] - 10https://gerrit.wikimedia.org/r/448617 (https://phabricator.wikimedia.org/T192092) [18:23:24] (03PS2) 10Dzahn: scap: remove terbium from dsh groups [puppet] - 10https://gerrit.wikimedia.org/r/448617 (https://phabricator.wikimedia.org/T192092) [18:27:33] (03CR) 10Dzahn: [C: 032] scap: remove terbium from dsh groups [puppet] - 10https://gerrit.wikimedia.org/r/448617 (https://phabricator.wikimedia.org/T192092) (owner: 10Dzahn) [18:27:37] RECOVERY - Check systemd state on labtestcontrol2003 is OK: OK - running: The system is fully operational [18:27:48] PROBLEM - https://grafana.wikimedia.org/dashboard/db/varnish-http-requests grafana alert on einsteinium is CRITICAL: CRITICAL: https://grafana.wikimedia.org/dashboard/db/varnish-http-requests is alerting: 70% GET drop in 30min alert. [18:28:51] ema: ^ how unusual is this? [18:28:58] RECOVERY - Check systemd state on cloudcontrol1004 is OK: OK - running: The system is fully operational [18:30:03] when i zoom out on that graph, it seems not very [18:35:48] RECOVERY - https://grafana.wikimedia.org/dashboard/db/varnish-http-requests grafana alert on einsteinium is OK: OK: https://grafana.wikimedia.org/dashboard/db/varnish-http-requests is not alerting. [18:36:02] ok [18:42:43] (03Abandoned) 10MarcoAurelio: Create site striker.wikimedia.org [dns] - 10https://gerrit.wikimedia.org/r/441817 (https://phabricator.wikimedia.org/T189637) (owner: 10MarcoAurelio) [18:51:20] (03PS1) 10Dzahn: network::constants: remove terbium.eqiad.wmnet [puppet] - 10https://gerrit.wikimedia.org/r/448625 (https://phabricator.wikimedia.org/T192092) [18:52:52] (03CR) 10Dzahn: "apparently the only place using this list of maintenance hosts from constants.pp is elasticsearch::relforge" [puppet] - 10https://gerrit.wikimedia.org/r/448625 (https://phabricator.wikimedia.org/T192092) (owner: 10Dzahn) [18:53:23] 10Operations, 10Phabricator, 10Traffic, 10Zero: Missing IP addresses for Maroc Telecom - https://phabricator.wikimedia.org/T174342 (10Aklapper) [18:54:25] (03PS2) 10Dzahn: network::constants: remove terbium.eqiad.wmnet [puppet] - 10https://gerrit.wikimedia.org/r/448625 (https://phabricator.wikimedia.org/T192092) [18:57:01] !log temporarily depooled wdq1003 to let it catch up with lag [18:57:04] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [18:59:03] (03CR) 10Dzahn: [C: 032] network::constants: remove terbium.eqiad.wmnet [puppet] - 10https://gerrit.wikimedia.org/r/448625 (https://phabricator.wikimedia.org/T192092) (owner: 10Dzahn) [19:01:38] PROBLEM - mediawiki-installation DSH group on terbium is CRITICAL: Host terbium is not in mediawiki-installation dsh group [19:02:01] ^ yea, it shouldn't be, i want to remove it [19:04:53] !log terbium is being removed from ferm rules on elasticsearch/relforge, logstash/collector, mariadb/labtestwikitech and mw-maintenance itself (T192092) [19:04:56] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [19:04:57] T192092: setup replacements for maintenance_server (terbium, wasat) on Stretch - https://phabricator.wikimedia.org/T192092 [19:09:28] ACKNOWLEDGEMENT - mediawiki-installation DSH group on terbium is CRITICAL: Host terbium is not in mediawiki-installation dsh group daniel_zahn decom (T192092) [19:17:42] (03PS5) 10Dzahn: postgresql: add class to create db backups [puppet] - 10https://gerrit.wikimedia.org/r/447844 (https://phabricator.wikimedia.org/T190184) [19:19:18] (03PS6) 10Dzahn: netbox: add psql dump cron and back it up [puppet] - 10https://gerrit.wikimedia.org/r/447842 (https://phabricator.wikimedia.org/T190184) [19:19:53] (03Abandoned) 10Dzahn: decom terbium: rm from scap,site,dhcp,network constants [puppet] - 10https://gerrit.wikimedia.org/r/431041 (https://phabricator.wikimedia.org/T192092) (owner: 10Dzahn) [19:20:05] (03CR) 10jerkins-bot: [V: 04-1] netbox: add psql dump cron and back it up [puppet] - 10https://gerrit.wikimedia.org/r/447842 (https://phabricator.wikimedia.org/T190184) (owner: 10Dzahn) [19:20:06] (03Restored) 10Dzahn: decom terbium: rm from scap,site,dhcp,network constants [puppet] - 10https://gerrit.wikimedia.org/r/431041 (https://phabricator.wikimedia.org/T192092) (owner: 10Dzahn) [19:21:32] (03Abandoned) 10Dzahn: decom terbium: rm from scap,site,dhcp,network constants [puppet] - 10https://gerrit.wikimedia.org/r/431041 (https://phabricator.wikimedia.org/T192092) (owner: 10Dzahn) [19:23:51] 10Operations, 10Wikimedia-Logstash, 10monitoring, 10User-herron: Send logstash service metrics to prometheus - https://phabricator.wikimedia.org/T200362 (10herron) Also set up prometheus-logstash-exporter (https://gitlab.com/alxrem/prometheus-logstash-exporter). Example metrics: P7394 At a quick glance p... [19:29:55] (03PS3) 10Dzahn: sentry: base::service_unit -> systemd::service [puppet] - 10https://gerrit.wikimedia.org/r/434539 (https://phabricator.wikimedia.org/T194724) [19:31:15] (03CR) 10Dzahn: [C: 032] sentry: base::service_unit -> systemd::service [puppet] - 10https://gerrit.wikimedia.org/r/434539 (https://phabricator.wikimedia.org/T194724) (owner: 10Dzahn) [19:31:48] (03CR) 10Dzahn: [C: 032] "fixed the TODO and watching it on deployment-sentry01 (https://tools.wmflabs.org/openstack-browser/puppetclass/role::sentry)" [puppet] - 10https://gerrit.wikimedia.org/r/434539 (https://phabricator.wikimedia.org/T194724) (owner: 10Dzahn) [19:40:22] (03PS1) 10Dzahn: simplelap: apache -> httpd module [puppet] - 10https://gerrit.wikimedia.org/r/448635 [19:41:00] (03CR) 10Dzahn: "https://tools.wmflabs.org/openstack-browser/puppetclass/role::simplelap" [puppet] - 10https://gerrit.wikimedia.org/r/448635 (owner: 10Dzahn) [19:41:41] (03CR) 10Dzahn: "only used by a single project (as opposed to simplelamp with m for mysql)" [puppet] - 10https://gerrit.wikimedia.org/r/448635 (owner: 10Dzahn) [19:42:11] (03PS2) 10Dzahn: simplelap: apache -> httpd module [puppet] - 10https://gerrit.wikimedia.org/r/448635 [19:46:46] (03PS1) 10Andrew Bogott: rabbitmq: allow access to designate via ipv6 [puppet] - 10https://gerrit.wikimedia.org/r/448637 [19:49:37] (03CR) 10Andrew Bogott: [C: 032] rabbitmq: allow access to designate via ipv6 [puppet] - 10https://gerrit.wikimedia.org/r/448637 (owner: 10Andrew Bogott) [20:03:43] (03PS1) 10Andrew Bogott: Allow labtestservices boxes to access the labtestcontrol db via ipv6 [puppet] - 10https://gerrit.wikimedia.org/r/448650 [20:04:52] (03CR) 10Andrew Bogott: [C: 032] Allow labtestservices boxes to access the labtestcontrol db via ipv6 [puppet] - 10https://gerrit.wikimedia.org/r/448650 (owner: 10Andrew Bogott) [20:07:37] (03PS1) 10Andrew Bogott: Added some much-needed parentheses to a ferm rule [puppet] - 10https://gerrit.wikimedia.org/r/448651 [20:08:40] (03CR) 10Andrew Bogott: [C: 032] Added some much-needed parentheses to a ferm rule [puppet] - 10https://gerrit.wikimedia.org/r/448651 (owner: 10Andrew Bogott) [20:09:57] PROBLEM - Check systemd state on cloudcontrol1003 is CRITICAL: CRITICAL - degraded: The system is operational but one or more units failed. [20:37:17] RECOVERY - Check systemd state on cloudcontrol1003 is OK: OK - running: The system is fully operational [20:40:23] !log re-pooled wdqs1003 [20:40:26] Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log [20:50:41] 10Operations, 10Wikidata, 10Wikidata-Query-Service, 10Patch-For-Review: WDQS disk usage increase is correlated with reloading of categories - https://phabricator.wikimedia.org/T200202 (10Smalyshev) Generally since new categories are loaded before old ones are deleted, the space bump is expected - and Blaze... [20:52:21] 10Operations, 10Wikidata, 10Wikidata-Query-Service, 10Patch-For-Review: WDQS disk usage increase is correlated with reloading of categories - https://phabricator.wikimedia.org/T200202 (10Smalyshev) [20:52:47] 10Operations, 10Wikidata, 10Wikidata-Query-Service, 10Patch-For-Review: WDQS disk usage increase is correlated with reloading of categories - https://phabricator.wikimedia.org/T200202 (10Smalyshev) Also, once T198356 is implemented, we won't need to reload category namespace (at least not too often) so tha... [20:58:57] 10Operations, 10Discovery, 10Traffic, 10Wikidata, and 2 others: Consider switching to HTTPS for Wikidata query service links - https://phabricator.wikimedia.org/T153563 (10Smalyshev) [20:59:00] 10Operations, 10Discovery, 10WMDE-Tech-Communication, 10Wikidata, and 2 others: announce breaking change: http > https for entities in rdf - https://phabricator.wikimedia.org/T154015 (10Smalyshev) 05stalled>03declined Since parent is declined, declining this too. [20:59:25] 10Operations, 10Discovery, 10Traffic, 10Wikidata, and 2 others: compile number of http uses for http://www.wikidata.org/entity - https://phabricator.wikimedia.org/T154017 (10Smalyshev) 05stalled>03declined [20:59:27] 10Operations, 10Discovery, 10Traffic, 10Wikidata, and 2 others: Consider switching to HTTPS for Wikidata query service links - https://phabricator.wikimedia.org/T153563 (10Smalyshev) [20:59:56] 10Operations, 10Performance-Team, 10Traffic, 10Wikimedia-General-or-Unknown: Search engines continue to link to JS-redirect destination after Wikipedia copyright protest - https://phabricator.wikimedia.org/T199252 (10Imarlier) a:03Imarlier [21:09:08] 10Operations, 10Performance-Team, 10Traffic, 10Wikimedia-General-or-Unknown: Search engines continue to link to JS-redirect destination after Wikipedia copyright protest - https://phabricator.wikimedia.org/T199252 (10Imarlier) Here's what I know at this point: 1. Google did, in fact, last index it.m.wik... [21:10:34] 10Operations, 10Performance-Team, 10Traffic, 10Wikimedia-General-or-Unknown: Search engines continue to link to JS-redirect destination after Wikipedia copyright protest - https://phabricator.wikimedia.org/T199252 (10Imarlier) @dr0ptp4kt @Dzahn or others: Any chance of also giving me access to it.wikipedia... [21:18:42] 10Operations, 10Performance-Team, 10Traffic, 10Wikimedia-General-or-Unknown: Search engines continue to link to JS-redirect destination after Wikipedia copyright protest - https://phabricator.wikimedia.org/T199252 (10dr0ptp4kt) Update to the ticket: webmaster console access has been provided to Ian for htt... [21:21:12] (03CR) 10Dzahn: [C: 032] "openstack-browser showed only 1 project using this and that has been deleted. so nothing should currently use this. testing it on a fresh " [puppet] - 10https://gerrit.wikimedia.org/r/448635 (owner: 10Dzahn) [21:21:21] (03PS3) 10Dzahn: simplelap: apache -> httpd module [puppet] - 10https://gerrit.wikimedia.org/r/448635 [21:26:45] (03CR) 10Dzahn: [C: 032] "Active: active (running) since Wed 2018-06-06 16:35:09 UTC; 1 months 20 days ago" [puppet] - 10https://gerrit.wikimedia.org/r/434539 (https://phabricator.wikimedia.org/T194724) (owner: 10Dzahn) [21:30:35] (03PS1) 10Dzahn: ircecho: base::service_unit -> systemd::service [puppet] - 10https://gerrit.wikimedia.org/r/448770 (https://phabricator.wikimedia.org/T194724) [21:31:13] (03CR) 10jerkins-bot: [V: 04-1] ircecho: base::service_unit -> systemd::service [puppet] - 10https://gerrit.wikimedia.org/r/448770 (https://phabricator.wikimedia.org/T194724) (owner: 10Dzahn) [21:31:33] (03PS2) 10Dzahn: ircecho: base::service_unit -> systemd::service [puppet] - 10https://gerrit.wikimedia.org/r/448770 (https://phabricator.wikimedia.org/T194724) [21:32:04] (03CR) 10jerkins-bot: [V: 04-1] ircecho: base::service_unit -> systemd::service [puppet] - 10https://gerrit.wikimedia.org/r/448770 (https://phabricator.wikimedia.org/T194724) (owner: 10Dzahn) [21:32:44] (03PS3) 10Dzahn: ircecho: base::service_unit -> systemd::service [puppet] - 10https://gerrit.wikimedia.org/r/448770 (https://phabricator.wikimedia.org/T194724) [21:34:39] (03PS1) 10Andrew Bogott: labtest: yet more ipv6 ferm fixes [puppet] - 10https://gerrit.wikimedia.org/r/448777 [21:35:52] (03CR) 10Andrew Bogott: [C: 032] labtest: yet more ipv6 ferm fixes [puppet] - 10https://gerrit.wikimedia.org/r/448777 (owner: 10Andrew Bogott) [21:40:47] (03PS1) 10Dzahn: mediawiki::cgroup: base::service_unit -> systemd::service [puppet] - 10https://gerrit.wikimedia.org/r/448778 (https://phabricator.wikimedia.org/T194724) [21:44:50] (03PS1) 10Dzahn: graphite::carbon_c_relay: base::service_unit -> systemd::service [puppet] - 10https://gerrit.wikimedia.org/r/448779 (https://phabricator.wikimedia.org/T194724) [21:45:33] (03PS2) 10Dzahn: proton: use ::profile::base::firewall [puppet] - 10https://gerrit.wikimedia.org/r/448158 [21:48:55] (03CR) 10Dzahn: [C: 032] proton: use ::profile::base::firewall [puppet] - 10https://gerrit.wikimedia.org/r/448158 (owner: 10Dzahn) [21:51:03] (03PS2) 10Dzahn: sca/scb: use ::profile::base::firewall [puppet] - 10https://gerrit.wikimedia.org/r/448583 [21:54:06] (03PS1) 10Dzahn: toollabs: use ::profile::base::firewall [puppet] - 10https://gerrit.wikimedia.org/r/448785 [21:56:23] (03PS1) 10Dzahn: cluster::management: use ::profile::base::firewall [puppet] - 10https://gerrit.wikimedia.org/r/448787 [21:57:18] (03CR) 10MarcoAurelio: [C: 031] "Looks good technically. But while I don't doubt your assesment in the task about people telling you, maybe it'd be good to have some sort " [mediawiki-config] - 10https://gerrit.wikimedia.org/r/448553 (https://phabricator.wikimedia.org/T200522) (owner: 10Amire80) [21:58:47] (03PS1) 10Dzahn: xdummy: base::service_unit -> systemd::service [puppet] - 10https://gerrit.wikimedia.org/r/448788 (https://phabricator.wikimedia.org/T194724) [22:00:09] (03PS1) 10Bstorm: WIP tooforge: start writing module [puppet] - 10https://gerrit.wikimedia.org/r/448791 [22:01:12] (03CR) 10jerkins-bot: [V: 04-1] WIP tooforge: start writing module [puppet] - 10https://gerrit.wikimedia.org/r/448791 (owner: 10Bstorm) [22:06:33] (03PS2) 10Bstorm: WIP tooforge: start writing module [puppet] - 10https://gerrit.wikimedia.org/r/448791 [22:07:13] (03CR) 10jerkins-bot: [V: 04-1] WIP tooforge: start writing module [puppet] - 10https://gerrit.wikimedia.org/r/448791 (owner: 10Bstorm) [22:11:06] (03PS1) 10Dzahn: simplelap: ensure apache php module is installed [puppet] - 10https://gerrit.wikimedia.org/r/448798 [22:11:34] (03CR) 10Dzahn: [C: 032] "follow-up: https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/448798/" [puppet] - 10https://gerrit.wikimedia.org/r/448635 (owner: 10Dzahn) [22:12:19] (03CR) 10Dzahn: [C: 032] simplelap: ensure apache php module is installed [puppet] - 10https://gerrit.wikimedia.org/r/448798 (owner: 10Dzahn) [22:14:53] (03PS3) 10Bstorm: WIP tooforge: start writing module [puppet] - 10https://gerrit.wikimedia.org/r/448791 [22:15:37] (03CR) 10jerkins-bot: [V: 04-1] WIP tooforge: start writing module [puppet] - 10https://gerrit.wikimedia.org/r/448791 (owner: 10Bstorm) [22:15:50] (03PS3) 10Dzahn: simplelamp: apache -> httpd module [puppet] - 10https://gerrit.wikimedia.org/r/415510 [22:19:39] (03PS1) 10Dzahn: simplestatic: apache -> httpd module [puppet] - 10https://gerrit.wikimedia.org/r/448800 [22:19:59] (03CR) 10Dzahn: "tested on cloud vps - Notice: /Stage[main]/Packages::Libapache2_mod_php7.0/Package[libapache2-mod-php7.0]/ensure: created" [puppet] - 10https://gerrit.wikimedia.org/r/415510 (owner: 10Dzahn) [22:20:16] (03CR) 10Dzahn: ""simplelap" that was, but same code" [puppet] - 10https://gerrit.wikimedia.org/r/415510 (owner: 10Dzahn) [22:20:34] (03CR) 10Dzahn: [C: 032] "tested on cloud VPS: Notice: /Stage[main]/Packages::Libapache2_mod_php7.0/Package[libapache2-mod-php7.0]/ensure: created" [puppet] - 10https://gerrit.wikimedia.org/r/448798 (owner: 10Dzahn) [22:22:02] (03Abandoned) 10BryanDavis: labswiki: Replace 'm5-master' CNAME with backing db name [mediawiki-config] - 10https://gerrit.wikimedia.org/r/417324 (owner: 10BryanDavis) [22:22:54] (03CR) 10Amire80: "Thanks. I have already started a discussion, and pinged some people over email:" [mediawiki-config] - 10https://gerrit.wikimedia.org/r/448553 (https://phabricator.wikimedia.org/T200522) (owner: 10Amire80) [22:25:18] (03PS4) 10Bstorm: WIP tooforge: start writing module [puppet] - 10https://gerrit.wikimedia.org/r/448791 [22:25:20] (03CR) 10Dzahn: "only used by dashiki: https://tools.wmflabs.org/openstack-browser/puppetclass/role::simplestatic" [puppet] - 10https://gerrit.wikimedia.org/r/448800 (owner: 10Dzahn) [22:25:58] (03CR) 10jerkins-bot: [V: 04-1] WIP tooforge: start writing module [puppet] - 10https://gerrit.wikimedia.org/r/448791 (owner: 10Bstorm) [22:33:37] PROBLEM - https://grafana.wikimedia.org/dashboard/db/varnish-http-requests grafana alert on einsteinium is CRITICAL: CRITICAL: https://grafana.wikimedia.org/dashboard/db/varnish-http-requests is alerting: 70% GET drop in 30min alert. [22:34:47] RECOVERY - https://grafana.wikimedia.org/dashboard/db/varnish-http-requests grafana alert on einsteinium is OK: OK: https://grafana.wikimedia.org/dashboard/db/varnish-http-requests is not alerting. [22:36:30] we had one of these earlier. the type of alert/check is new to me though. when i zoomed out in the graph earlier it didn't look like a very unusual pattern (and apparently still can have -70% within 30 min) [22:36:44] feels like it just needs more tuning [22:38:04] (03CR) 10Dzahn: [C: 032] simplestatic: apache -> httpd module [puppet] - 10https://gerrit.wikimedia.org/r/448800 (owner: 10Dzahn) [22:38:24] (03CR) 10Dzahn: [C: 032] ""labs-only" and only one project" [puppet] - 10https://gerrit.wikimedia.org/r/448800 (owner: 10Dzahn) [22:39:18] PROBLEM - https://grafana.wikimedia.org/dashboard/db/varnish-http-requests grafana alert on einsteinium is CRITICAL: CRITICAL: https://grafana.wikimedia.org/dashboard/db/varnish-http-requests is alerting: 70% GET drop in 30min alert. [22:40:28] RECOVERY - https://grafana.wikimedia.org/dashboard/db/varnish-http-requests grafana alert on einsteinium is OK: OK: https://grafana.wikimedia.org/dashboard/db/varnish-http-requests is not alerting. [22:40:57] and that doesnt seem true when looking at graph [22:41:58] it's between 12 and 15 Mil..not -70% afaict [22:43:28] (03CR) 10Dzahn: [C: 032] "checking on dashiki-01 and dashiki-staging-01 VPS" [puppet] - 10https://gerrit.wikimedia.org/r/448800 (owner: 10Dzahn) [22:46:14] (03CR) 10Dzahn: [C: 031] "this change has always been noop everywhere else because all this is is:" [puppet] - 10https://gerrit.wikimedia.org/r/448785 (owner: 10Dzahn) [22:48:20] (03PS1) 10Dzahn: jobqueue_redis: use ::profile::base::firewall [puppet] - 10https://gerrit.wikimedia.org/r/448809 [22:50:48] (03PS1) 10Dzahn: docker::registry: use ::profile::base::firewall [puppet] - 10https://gerrit.wikimedia.org/r/448810 [22:56:08] (03PS1) 10Dzahn: toolserver_legacy: apache -> httpd module [puppet] - 10https://gerrit.wikimedia.org/r/448811 [23:16:29] (03PS1) 10Dzahn: switch terbium to a spare system [puppet] - 10https://gerrit.wikimedia.org/r/448816 (https://phabricator.wikimedia.org/T192092) [23:16:47] PROBLEM - https://grafana.wikimedia.org/dashboard/db/varnish-http-requests grafana alert on einsteinium is CRITICAL: CRITICAL: https://grafana.wikimedia.org/dashboard/db/varnish-http-requests is alerting: 70% GET drop in 30min alert. [23:17:57] RECOVERY - https://grafana.wikimedia.org/dashboard/db/varnish-http-requests grafana alert on einsteinium is OK: OK: https://grafana.wikimedia.org/dashboard/db/varnish-http-requests is not alerting.