[00:00:09] AndyRussG|a-whey: I don't think it's a MaxMind problem (as in, something their support can help you with). There are probably many valid theories on correlating "unknown region" with lower ad impression (e.g. anonymizing proxies, hosting in datacenters, etc...), but the whole reason maxmind put them in the unknown region category is because they don't know much about them... [00:00:50] I see that it's a statistical anomaly in some sense, but that doesn't mean it's an actual problem to pursue [00:01:29] bblack: hmmmm right... I mean, it would be a problem if those were actual mobile users that didn't get impressions, but that they should [00:02:10] I don't think it's a problem that MaxMind is causing, but I just thought they might be willing to give us some insight into what those Unknown regions have in common [00:02:34] At first I thought it was slower connections, but that was mistaken (see further up in the task) [00:02:42] Also California doesn't look great [00:03:32] Was planning to do some host lookups for the IP address in those low-impression regions, see if anything jumped out [00:03:52] there are probably many causes [00:04:14] hmmmm right [00:04:41] I wonder how they actually get a location from an IP? Do they have agreements with ISPs to get that data? [00:04:57] the point is, they're unknowns. but by not being the very standard case of normal client IPs on normal geolocated consumer networks, they're probably generally more likely to be: running bots/code rather than humans+UAs, and/or running various ad-block methods (possibly while anonymizing traffic), etc [00:05:00] Are mobile networks inherently less locatable because it's understood the device moves around? [00:05:50] So "Unknown" does most likely mean, not a normal client ISP? [00:05:51] AndyRussG: there are many networks that are inherently less-locateable. "How they get the data" is pretty much how they exist - the core value they provide as a company is that they aggregate public data and whatever magic sauce they can glean from everywhere else they can into a better info source than you can find elsewhere. [00:06:18] heh right [00:06:26] * AndyRussG sniffs for secret ingredients [00:06:36] unknown region is the least of them. some IPs are also unknown country/continent. They have a few subcategories of those for anonymozing proxies, satellite link providers, truly-unknown, etc [00:07:10] that we get the level of geolcation data we do at all is a miracle :) [00:07:42] hmmm right [00:08:10] RECOVERY - puppet last run on cp3020 is OK: OK: Puppet is currently enabled, last run 14 seconds ago with 0 failures [00:08:39] They don't tell us which of those subcategories an IP is in? [00:08:59] they do in some cases when the country isn't known, by using the "fake" country codes A1, A2, etc [00:09:09] Ah right [00:09:23] http://dev.maxmind.com/geoip/legacy/codes/iso3166/ [00:09:26] ^ at the top there [00:10:00] K [00:10:11] but I think there are probably a lot of cases that are similar in nature to A1/A2, but where they know the basic country-level info from public sources, but not much beyond that. And they don't tend to have sub-country-level regions with ids like A1/A2 [00:10:26] mmm [00:11:15] anyways, I have to run, just my $0.02, but I wouldn't worry about it [00:11:23] Basically at this point we have a few action items... one is to be sure the low impression rate is not a problem with our code/setup (which would be the case if it corresponded to bad network speed, latency) [00:11:24] there are unknown unknowns, this is the internet [00:11:29] bblack: ah K thx much! [00:11:49] Really appreciate ur thoughts on this!! :) [00:12:08] the only way to know that (e.g. network speed) would be to test that hypothesis in data you do have that differentiates it. looking at unknowns is like staring into the void. [00:13:04] (03PS3) 10Dzahn: base: add lshw to standard packages [puppet] - 10https://gerrit.wikimedia.org/r/328952 [00:13:05] bblack: we did do a comparison on network speed (using data from NavigationTiming, responseStart timing) [00:13:23] The only correlation was for that specific Unknown region (which had slow speed) [00:13:42] while other low impression regions had normal speed, and other slow speed regions had normal impressions [00:14:12] But I don't know if there are other measures of network quality that we could be missing. For example, an extra DNS lookup is needed to load a banner [00:14:24] (since it comes from Meta) [00:19:34] (03PS1) 10BryanDavis: logstash: Add a json_lines TCP input [puppet] - 10https://gerrit.wikimedia.org/r/329020 (https://phabricator.wikimedia.org/T151422) [00:21:08] AndyRussG: bblack: the MaxMind talk reminded me of this article about the couple in Kansas who happen to live at the coordinates that MaxMind picked as "center of the US" and because tons of things get located to "US" but without more details they all end up at the center. So in real life that turned their life into hell. [00:21:13] ""The following events appeared to originate at the residence and brought trespassers and/or law enforcement to the plaintiffs’ home at all hours of the night and day: stolen cars, fraud related to tax returns and bitcoin, stolen credit cards, suicide calls, private investigators, stolen social media accounts, fund raising events, and numerous other events." [00:21:18] http://arstechnica.com/tech-policy/2016/08/kansas-couple-sues-ip-mapping-firm-for-turning-their-life-into-a-digital-hell/ [00:22:44] (03PS4) 10BryanDavis: logstash: Parse nginx access logs for wdqs [puppet] - 10https://gerrit.wikimedia.org/r/299825 [00:27:17] "To its credit, MaxMind has since fixed the error in its IP databases by moving the location of a default IP address to the middle of a Kansas lake" "How is that better? Just tell your users that it's unknown / null. Or fall back to something a little more obviously fake, like 0, 0. " [00:42:27] (03PS1) 10Alex Monk: toollabs: Don't use wikitech API to find labs instances in tools-clush-generator [puppet] - 10https://gerrit.wikimedia.org/r/329021 (https://phabricator.wikimedia.org/T104575) [00:45:07] mutante: heheh that's hilarious! ;p [00:45:26] not so much for the couple [00:45:45] godog, hey, do you happen to still be around? [00:46:43] Krenair: not for long still :) [00:47:07] 06Operations: debdeploy should show which servers need service restarts - https://phabricator.wikimedia.org/T154068#2900051 (10Dzahn) [00:47:16] 06Operations: debdeploy should show which servers need service restarts - https://phabricator.wikimedia.org/T154068#2900063 (10Dzahn) p:05Triage>03Low [00:47:16] godog, you know prometheus-labs-targets? [00:47:26] Krenair: yeah I wrote it [00:47:35] ok :) [00:47:53] well it uses the wikitech novainstances API [00:48:34] ah, I'm assuming that'd need migration [00:48:41] we're able to call OpenStack directly now, and there's https://phabricator.wikimedia.org/T104575 [00:49:34] 06Operations: debdeploy should show which servers need service restarts - https://phabricator.wikimedia.org/T154068#2900064 (10Dzahn) [00:49:51] (openstack doesn't have a proper guest account, but we now have our own called novaobserver with a public password and read-only privileges) [00:50:09] the thing is I don't want to duplicate that password everywhere [00:51:06] (03PS1) 10MaxSem: Upgrade Collection's license URL to HTTPS [mediawiki-config] - 10https://gerrit.wikimedia.org/r/329023 [00:51:07] so it'd need to be read from a file or passed in somehow [00:52:11] fair, yeah either a file or the environment I'd say, the former is more secure as you can use permissions, the latter is available to any user but meh I guess it doesn't matter in this case [00:53:33] given the number of callers across the repo I'm probably just going to set up an extra file containing it [00:54:26] could template it in, but ugh [00:54:35] the script is written to /usr/local/bin/prometheus-labs-targets, any preferences for this? [00:55:34] not really no, the file sounds good to me [00:57:35] ok [01:00:28] have a good holiday g.odog [01:00:47] Krenair: are all callers python btw? if that's the case it might make sense a separate module that does the setup and returns e.g. keystone sessions [01:00:54] Krenair: you too! [01:00:56] * godog off [01:01:27] yeah I've been considering doing basically that instead, my commits are getting repetitive [01:05:10] PROBLEM - puppet last run on sca1004 is CRITICAL: CRITICAL: Puppet has 27 failures. Last run 3 minutes ago with 27 failures. Failed resources (up to 3 shown): Exec[eth0_v6_token],Package[wipe],Package[zotero/translators],Package[zotero/translation-server] [01:05:18] actually there's most of one already in there [01:06:27] nope, the whole thing. okay... [01:07:17] oh [01:07:18] python2 [01:12:14] all of these are written for python3 [01:23:02] (03CR) 10Alex Monk: [C: 04-1] "these won't work because we don't have the packages installed for python3. we should also be using the mwopenstackclients module once that" [puppet] - 10https://gerrit.wikimedia.org/r/328608 (https://phabricator.wikimedia.org/T104575) (owner: 10Alex Monk) [01:23:05] (03CR) 10Alex Monk: [C: 04-1] "these won't work because we don't have the packages installed for python3. we should also be using the mwopenstackclients module once that" [puppet] - 10https://gerrit.wikimedia.org/r/328609 (https://phabricator.wikimedia.org/T104575) (owner: 10Alex Monk) [01:23:12] (03CR) 10Alex Monk: [C: 04-1] "these won't work because we don't have the packages installed for python3. we should also be using the mwopenstackclients module once that" [puppet] - 10https://gerrit.wikimedia.org/r/328611 (owner: 10Alex Monk) [01:23:15] (03CR) 10Alex Monk: [C: 04-1] "these won't work because we don't have the packages installed for python3. we should also be using the mwopenstackclients module once that" [puppet] - 10https://gerrit.wikimedia.org/r/329021 (https://phabricator.wikimedia.org/T104575) (owner: 10Alex Monk) [01:26:50] PROBLEM - puppet last run on elastic1021 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [01:33:10] RECOVERY - puppet last run on sca1004 is OK: OK: Puppet is currently enabled, last run 35 seconds ago with 0 failures [01:38:28] (03PS1) 10Ladsgroup: Change VE tabs default preferences to multitab in fawiki [mediawiki-config] - 10https://gerrit.wikimedia.org/r/329026 (https://phabricator.wikimedia.org/T154070) [01:47:38] left a note on the thread [01:49:20] PROBLEM - puppet last run on multatuli is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [01:54:50] RECOVERY - puppet last run on elastic1021 is OK: OK: Puppet is currently enabled, last run 39 seconds ago with 0 failures [02:18:24] RECOVERY - puppet last run on multatuli is OK: OK: Puppet is currently enabled, last run 58 seconds ago with 0 failures [02:41:40] PROBLEM - Ubuntu mirror in sync with upstream on sodium is CRITICAL: /srv/mirrors/ubuntu is over 14 hours old. [03:34:10] PROBLEM - puppet last run on phab2001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [04:03:10] RECOVERY - puppet last run on phab2001 is OK: OK: Puppet is currently enabled, last run 39 seconds ago with 0 failures [04:08:40] PROBLEM - mailman I/O stats on fermium is CRITICAL: CRITICAL - I/O stats: Transfers/Sec=2533.40 Read Requests/Sec=475.50 Write Requests/Sec=0.50 KBytes Read/Sec=39975.60 KBytes_Written/Sec=6.80 [04:13:50] PROBLEM - puppet last run on labsdb1004 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [04:19:40] RECOVERY - mailman I/O stats on fermium is OK: OK - I/O stats: Transfers/Sec=157.30 Read Requests/Sec=311.30 Write Requests/Sec=1.80 KBytes Read/Sec=2257.20 KBytes_Written/Sec=354.40 [04:41:50] RECOVERY - puppet last run on labsdb1004 is OK: OK: Puppet is currently enabled, last run 13 seconds ago with 0 failures [05:40:20] PROBLEM - puppet last run on sca2004 is CRITICAL: CRITICAL: Puppet has 27 failures. Last run 2 minutes ago with 27 failures. Failed resources (up to 3 shown): Exec[eth0_v6_token],Package[wipe],Package[zotero/translators],Package[zotero/translation-server] [05:53:20] PROBLEM - Redis status tcp_6479 on rdb2006 is CRITICAL: CRITICAL: replication_delay is 653 600 - REDIS 2.8.17 on 10.192.48.44:6479 has 1 databases (db0) with 3763271 keys, up 53 days 21 hours - replication_delay is 653 [05:55:20] RECOVERY - Redis status tcp_6479 on rdb2006 is OK: OK: REDIS 2.8.17 on 10.192.48.44:6479 has 1 databases (db0) with 3709274 keys, up 53 days 21 hours - replication_delay is 58 [06:07:20] RECOVERY - puppet last run on sca2004 is OK: OK: Puppet is currently enabled, last run 37 seconds ago with 0 failures [06:43:40] RECOVERY - Ubuntu mirror in sync with upstream on sodium is OK: /srv/mirrors/ubuntu is over 0 hours old. [08:03:00] elukey _joe_: mw1259 suddenly got loaded. did anything change? https://ganglia.wikimedia.org/latest/?r=2hr&cs=&ce=&m=cpu_report&c=Video+scalers+eqiad&h=mw1259.eqiad.wmnet&tab=m&vn=&hide-hf=false&mc=2&z=medium&metric_group=NOGROUPS [08:05:10] PROBLEM - puppet last run on sca1004 is CRITICAL: CRITICAL: Puppet has 27 failures. Last run 2 minutes ago with 27 failures. Failed resources (up to 3 shown): Exec[eth0_v6_token],Package[wipe],Package[zotero/translators],Package[zotero/translation-server] [08:32:30] RECOVERY - puppet last run on sca1004 is OK: OK: Puppet is currently enabled, last run 34 seconds ago with 0 failures [08:55:00] PROBLEM - puppet last run on sca1003 is CRITICAL: CRITICAL: Puppet has 20 failures. Last run 2 minutes ago with 20 failures. Failed resources (up to 3 shown): Package[sysstat],Package[molly-guard],Package[lldpd],Package[ncdu] [09:12:01] zhuyifei1999_: seems a temp spike, I wouldn't worry - https://grafana.wikimedia.org/dashboard/file/server-board.json?var-server=mw1259&var-network=eth0&from=now-6h&to=now [09:13:09] (didn't find also any errors in the apache log) [09:13:18] (afk again, will read later on :) [09:14:23] (so let's see if it resolves later on) [09:22:00] RECOVERY - puppet last run on sca1003 is OK: OK: Puppet is currently enabled, last run 12 seconds ago with 0 failures [10:36:06] (03CR) 10Volans: [C: 04-1] "Comments on the topk.py file inline." (0320 comments) [puppet] - 10https://gerrit.wikimedia.org/r/328660 (https://phabricator.wikimedia.org/T147366) (owner: 10Mobrovac) [12:40:20] PROBLEM - puppet last run on sca2004 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [13:08:20] RECOVERY - puppet last run on sca2004 is OK: OK: Puppet is currently enabled, last run 51 seconds ago with 0 failures [13:53:10] PROBLEM - puppet last run on ocg1001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [14:22:10] RECOVERY - puppet last run on ocg1001 is OK: OK: Puppet is currently enabled, last run 55 seconds ago with 0 failures [14:31:00] PROBLEM - Check whether ferm is active by checking the default input chain on analytics1033 is CRITICAL: ERROR ferm input drop default policy not set, ferm might not have been started correctly [14:32:00] RECOVERY - Check whether ferm is active by checking the default input chain on analytics1033 is OK: OK ferm input default policy is set [14:41:30] PROBLEM - puppet last run on es1018 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [14:54:30] PROBLEM - puppet last run on wdqs1002 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:05:30] PROBLEM - puppet last run on sca1004 is CRITICAL: CRITICAL: Puppet has 27 failures. Last run 2 minutes ago with 27 failures. Failed resources (up to 3 shown): Exec[eth0_v6_token],Package[wipe],Package[zotero/translators],Package[zotero/translation-server] [15:09:30] RECOVERY - puppet last run on es1018 is OK: OK: Puppet is currently enabled, last run 20 seconds ago with 0 failures [15:14:20] PROBLEM - puppet last run on dbproxy1010 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [15:22:30] RECOVERY - puppet last run on wdqs1002 is OK: OK: Puppet is currently enabled, last run 38 seconds ago with 0 failures [15:33:30] RECOVERY - puppet last run on sca1004 is OK: OK: Puppet is currently enabled, last run 43 seconds ago with 0 failures [15:42:20] RECOVERY - puppet last run on dbproxy1010 is OK: OK: Puppet is currently enabled, last run 14 seconds ago with 0 failures [15:53:20] PROBLEM - puppet last run on mw1256 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [16:22:20] RECOVERY - puppet last run on mw1256 is OK: OK: Puppet is currently enabled, last run 56 seconds ago with 0 failures [16:32:30] PROBLEM - puppet last run on nescio is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [17:01:30] RECOVERY - puppet last run on nescio is OK: OK: Puppet is currently enabled, last run 36 seconds ago with 0 failures [17:09:17] 06Operations, 10Traffic, 07Malayam-Sites, 07Mobile, 13Patch-For-Review: ml.wikipedia.org not redirecting to mobile site while accessing from a mobile device; many "Error: Module not found" errors - https://phabricator.wikimedia.org/T115191#2900674 (10MarcoAurelio) [17:11:03] 06Operations, 10Wikimedia-General-or-Unknown, 07I18n, 07Malayam-Sites, and 2 others: Update Malayalam fonts packages - https://phabricator.wikimedia.org/T33950#2900726 (10MarcoAurelio) [17:14:07] 06Operations, 10Wikimedia-General-or-Unknown, 07I18n, 07Malayam-Sites, and 2 others: Update Malayalam fonts packages - https://phabricator.wikimedia.org/T33950#2900820 (10MarcoAurelio) [18:25:10] PROBLEM - puppet last run on db1011 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [18:35:30] PROBLEM - puppet last run on contint2001 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [18:51:30] RECOVERY - puppet last run on contint2001 is OK: OK: Puppet is currently enabled, last run 53 seconds ago with 0 failures [18:53:10] RECOVERY - puppet last run on db1011 is OK: OK: Puppet is currently enabled, last run 5 seconds ago with 0 failures [19:13:50] PROBLEM - check mtime mod from tools cron job on checker.tools.wmflabs.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 SERVICE UNAVAILABLE - string OK not found on http://checker.tools.wmflabs.org:80/toolscron - 185 bytes in 0.007 second response time [19:17:48] (03PS8) 10Marostegui: Reporting tests with the private data script [puppet] - 10https://gerrit.wikimedia.org/r/328352 (https://phabricator.wikimedia.org/T153680) [19:17:50] RECOVERY - check mtime mod from tools cron job on checker.tools.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 166 bytes in 0.006 second response time [19:21:17] (03CR) 10Marostegui: [C: 031] Rename ferm service in role::labs::db::replica [puppet] - 10https://gerrit.wikimedia.org/r/328683 (owner: 10Muehlenhoff) [19:29:50] PROBLEM - check mtime mod from tools cron job on checker.tools.wmflabs.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 SERVICE UNAVAILABLE - string OK not found on http://checker.tools.wmflabs.org:80/toolscron - 185 bytes in 0.005 second response time [19:49:58] (03PS1) 10Yuvipanda: Fix locale for debian based images [docker-images/toollabs-images] - 10https://gerrit.wikimedia.org/r/329062 [19:50:28] zhuyifei1999_: ^ should work, I'm going to build it now [19:50:52] ok [19:51:17] thx, oh and merry christmas :) [19:53:31] "Cultural imperialism, etc" o.O [19:53:57] zhuyifei1999_: a joke about how en_US is what we pick :) [19:54:47] lol indeed [19:55:46] zhuyifei1999_: am building the image, will push in 10 mins or so [19:56:02] k [20:08:50] RECOVERY - check mtime mod from tools cron job on checker.tools.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 166 bytes in 0.006 second response time [20:10:04] (03PS2) 10Yuvipanda: Fix locale for debian based images [docker-images/toollabs-images] - 10https://gerrit.wikimedia.org/r/329062 (https://phabricator.wikimedia.org/T154088) [20:10:24] (03CR) 10Yuvipanda: [V: 032 C: 032] Fix locale for debian based images [docker-images/toollabs-images] - 10https://gerrit.wikimedia.org/r/329062 (https://phabricator.wikimedia.org/T154088) (owner: 10Yuvipanda) [20:11:02] (03PS3) 10Yuvipanda: Fix locale for debian based images [docker-images/toollabs-images] - 10https://gerrit.wikimedia.org/r/329062 (https://phabricator.wikimedia.org/T154088) [20:21:00] PROBLEM - check mtime mod from tools cron job on checker.tools.wmflabs.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 SERVICE UNAVAILABLE - string OK not found on http://checker.tools.wmflabs.org:80/toolscron - 185 bytes in 0.005 second response time [20:23:30] PROBLEM - puppet last run on uranium is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [20:51:30] RECOVERY - puppet last run on uranium is OK: OK: Puppet is currently enabled, last run 15 seconds ago with 0 failures [21:59:30] PROBLEM - puppet last run on ms-be1014 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [22:04:30] PROBLEM - puppet last run on sca1004 is CRITICAL: CRITICAL: Puppet has 27 failures. Last run 2 minutes ago with 27 failures. Failed resources (up to 3 shown): Exec[eth0_v6_token],Package[wipe],Package[zotero/translators],Package[zotero/translation-server] [22:06:20] PROBLEM - OCG health on ocg1002 is CRITICAL: CRITICAL: ocg_job_status 1026877 msg (=800000 warning): ocg_render_job_queue 3068 msg (=3000 critical) [22:06:20] PROBLEM - OCG health on ocg1001 is CRITICAL: CRITICAL: ocg_job_status 1026891 msg (=800000 warning): ocg_render_job_queue 3079 msg (=3000 critical) [22:06:30] PROBLEM - OCG health on ocg1003 is CRITICAL: CRITICAL: ocg_job_status 1026939 msg (=800000 warning): ocg_render_job_queue 3084 msg (=3000 critical) [22:11:52] 06Operations, 10ops-codfw, 10DBA: db2060 crashed (RAID controller) - https://phabricator.wikimedia.org/T154031#2901122 (10Marostegui) We should restart the server anyways so we can probably take advantage of that and upgrade whatever needs some upgrade: ``` Cache Status Details: The current array control... [22:23:30] PROBLEM - puppet last run on mw1297 is CRITICAL: CRITICAL: Catalog fetch fail. Either compilation failed or puppetmaster has issues [22:27:30] RECOVERY - puppet last run on ms-be1014 is OK: OK: Puppet is currently enabled, last run 1 second ago with 0 failures [22:32:30] RECOVERY - puppet last run on sca1004 is OK: OK: Puppet is currently enabled, last run 34 seconds ago with 0 failures [22:52:30] RECOVERY - puppet last run on mw1297 is OK: OK: Puppet is currently enabled, last run 42 seconds ago with 0 failures