Slow SQL, trying to understand what is going on - mysql

I have this SQL that started to run very slow on production server ( >20s ). When I run it locally on my laptop on the same DB it runs under 1s.
I know that SQL is not optimal but I tried to run EXPLAIN but I don't quite understand the result.
SELECT ST.id,
ST.lot_id,
ST.warehouse_location_id,
WL.name,
P.id AS product_id,
P.sku,
P.code,
P.NAME AS product_name,
P.has_serial,
Sum(ST.quantity) AS qty,
Group_concat(ST.id) AS stock_transaction_ids
FROM stock_transactions ST LEFT JOIN products P
ON (P.id = ST.product_id)
LEFT JOIN warehouse_locations WL
ON (WL.id = ST.warehouse_location_id)
LEFT JOIN
(
SELECT stock_transaction_id,
Group_concat(serial_id) AS serials
FROM stock_transactions_serials
GROUP BY stock_transaction_id ) STS
ON STS.stock_transaction_id = ST.id
WHERE (
ST.document_type = "Goods transfer" AND
ST.document_id IN (806, 807, 808, 809, 810, 811, 812, 824, 827, 831, 835, 838, 839, 841, 868, 869, 870, 871, 872, 873, 874, 875, 876, 877, 878, 879, 880, 881, 882, 883, 884, 885, 886, 887, 888, 889, 890, 944, 945, 946, 954, 955, 956, 957, 961, 965, 966, 967, 2240, 2241, 2242, 2243, 2244, 2245, 2246, 2247, 2248, 2249, 2250, 2251, 2252, 2253, 2254, 2255, 2256, 2257, 2258, 2259, 2260, 2261, 2262, 2263, 2264, 2265, 2266, 2267, 2268, 2269, 2270, 2271, 2272, 2273, 2274, 2275, 2276, 2277, 2278, 2279, 2280, 2281, 2282, 2283, 2284, 2285, 2286, 2287, 2288, 2289, 2290, 2291, 2292, 2293, 2294, 2295, 2296, 2297, 2298, 2299, 2300, 2301, 2302, 2303, 2304, 2305, 2306, 2307, 2308, 2309, 2310, 2311, 2312, 2313, 2314, 2315, 2316, 2317, 2318, 2319, 2320, 2321, 2322, 2323, 2324, 2325, 2326, 2327, 2328, 2329, 2330, 2331, 2332, 2333, 2334, 2335, 2337, 2338, 2339, 2340, 2341, 2342, 2343, 2344, 2345, 2346, 2347, 2348, 2349, 2350, 2351, 2352, 2353, 2354, 2355, 2356, 2357, 2358, 2359, 2360, 2361, 2362, 2363, 2364, 2365, 2366, 2367, 2368, 2369, 2370, 2371, 2372, 2373, 2374, 2375, 2376, 2377, 2378, 2379, 2380, 2381, 2382, 2383, 2384, 2385, 2386, 2387, 2388, 2389, 2390, 2391, 2392, 2393, 2394, 2395, 2396, 2397, 2398, 2399, 2400, 2401, 2402, 2403, 2404, 2405, 2406, 2407, 2408, 2409, 2410, 2411, 2412, 2413, 2414, 2415, 2416, 2417, 2418, 2419, 2420, 2421, 2422, 2423, 2424, 2425, 2426, 2427, 2428, 2429, 2430, 2431, 2432, 2437, 2438, 2439, 2440, 2441, 2442, 2443, 2444, 2445, 2446, 2447, 2448, 2449, 2450, 2451, 2452, 2453, 2454, 2455, 2456, 2457, 2458, 2459, 2460, 2461, 2462, 2463, 2464, 2465, 2466, 2467, 2468, 2469, 2470, 2471, 2472, 2473, 2474, 2475, 2476, 2477, 2478, 2479, 2480, 2481, 2482, 2483, 2484, 2485, 2486, 2487, 2488, 2489, 2490, 2491, 2492, 2493, 2494, 2495, 2496, 2497, 2498, 2499, 2500, 2501, 2502, 2503, 2504, 2505, 2506, 2507, 2508, 2509, 2510, 2511, 2512, 2513, 2514, 2515, 2516, 2517, 2518, 2519, 2520, 2521, 2522, 2523, 2524, 2525, 2526, 2527, 2528, 2529, 2530, 2531, 2532, 2534, 2535, 2536, 2537, 2538, 2539, 2540, 2541, 2542, 2543, 2544, 2545, 2546, 2547, 2548, 2549, 2550, 2551, 2552, 2553, 2554, 2555, 2556, 2557, 2558, 2559, 2560, 2561, 2562, 2563, 2564, 2565, 2566, 2567, 2568, 2569, 2570, 2571, 2572, 2573, 2574, 2575, 2576, 2577, 2578, 2579, 2580, 2581, 2583, 2584, 2585, 2586, 2587, 2588, 2589, 2590, 2591, 2592, 2593, 2594, 2595, 2596, 2597, 2598, 2599, 2600, 2601, 2602, 2603, 2604, 2605, 2606, 2607, 2608, 2609, 2610, 2611, 2612, 2613, 2614, 2615, 2616, 2617, 2618, 2619, 2620, 2621, 2622, 2623, 2624, 2625, 2626, 2627, 2628, 2629, 2630, 3707, 3708, 10194, 10244, 10246, 10247, 10248, 10249, 10426, 10428, 27602, 27603, 28587, 28588, 28589, 28590, 28591, 28592, 28593, 28594, 28595, 28596, 28597, 28598, 28599, 28600, 28601, 28602, 28603, 28604, 28605, 28606, 28607, 28608, 28609, 28610, 28611, 28612, 28613, 28614, 28615, 28616, 28617, 28618, 28619, 28620, 28621, 28622, 28623, 28624, 28625, 28626, 28627, 28628, 28629, 28630, 28631, 28632, 28633, 28634, 28635, 28636, 28637, 28638, 28639, 28640, 28641, 28642, 28643) )
GROUP BY ST.product_id,
ST.lot_id,
ST.warehouse_location_id
HAVING qty > 0
Explain gets me this:
Can someone help understand what's going on? I will see if I can come back with SQL fiddle as well.

It looks like the problem is your derived table:
(
SELECT stock_transaction_id,
Group_concat(serial_id) AS serials
FROM stock_transactions_serials
GROUP BY stock_transaction_id ) STS
ON STS.stock_transaction_id = ST.id
I can't see you using serials anywhere else in the query, so I'm guessing this is not doing very much for you, but the cost is high - the left join isn't using an index when joining. I'd suggest rewriting that as a join on stock_transaction_serials, rather than a derived table.

you have
ON STS.stock_transaction_id = ST.id
WHERE (
ST.document_type = "Goods transfer" AND
ST.document_id IN (806, 807, 808, 809, 810, 811, 812, -- etc
If these values are in goods_transfer_items in a field called id then this would be exactly functionally the same:
ON STS.stock_transaction_id = ST.id
JOIN goods_transfer_items ON goods_transfer_items.id = ST.document_id
AND goods_transfer_items.work_order_number=12345'
AND ST.document_type = "Goods transfer"
Not only is this much shorter but it will also be much faster because SQL servers can optimize joins.
This is functionally IDENTICAL to the NOT IN
prior answer below
You should take the values listed here
ST.document_id IN (806, 807, 808, 809, 810, 811, 812, 824, 827, 831, 835, 838, 839, 841, 868, 869, 870, 871, 872, 873, 874, 875, 876, 877, 878, 879, 880, 881, 882, 883, 884, 885, 886, 887, 888, 889, 890, 944, 945, 946, 954, 955, 956, 957, 961, 965, 966, 967, 2240, 2241, 2242, 2243, 2244, 2245, 2246, 2247, 2248, 2249, 2250, 2251, 2252, 2253, 2254, 2255, 2256, 2257, 2258, 2259, 2260, 2261, 2262, 2263, 2264, 2265, 2266, 2267, 2268, 2269, 2270, 2271, 2272, 2273, 2274, 2275, 2276, 2277, 2278, 2279, 2280, 2281, 2282, 2283, 2284, 2285, 2286, 2287, 2288, 2289, 2290, 2291, 2292, 2293, 2294, 2295, 2296, 2297, 2298, 2299, 2300, 2301, 2302, 2303, 2304, 2305, 2306, 2307, 2308, 2309, 2310, 2311, 2312, 2313, 2314, 2315, 2316, 2317, 2318, 2319, 2320, 2321, 2322, 2323, 2324, 2325, 2326, 2327, 2328, 2329, 2330, 2331, 2332, 2333, 2334, 2335, 2337, 2338, 2339, 2340, 2341, 2342, 2343, 2344, 2345, 2346, 2347, 2348, 2349, 2350, 2351, 2352, 2353, 2354, 2355, 2356, 2357, 2358, 2359, 2360, 2361, 2362, 2363, 2364, 2365, 2366, 2367, 2368, 2369, 2370, 2371, 2372, 2373, 2374, 2375, 2376, 2377, 2378, 2379, 2380, 2381, 2382, 2383, 2384, 2385, 2386, 2387, 2388, 2389, 2390, 2391, 2392, 2393, 2394, 2395, 2396, 2397, 2398, 2399, 2400, 2401, 2402, 2403, 2404, 2405, 2406, 2407, 2408, 2409, 2410, 2411, 2412, 2413, 2414, 2415, 2416, 2417, 2418, 2419, 2420, 2421, 2422, 2423, 2424, 2425, 2426, 2427, 2428, 2429, 2430, 2431, 2432, 2437, 2438, 2439, 2440, 2441, 2442, 2443, 2444, 2445, 2446, 2447, 2448, 2449, 2450, 2451, 2452, 2453, 2454, 2455, 2456, 2457, 2458, 2459, 2460, 2461, 2462, 2463, 2464, 2465, 2466, 2467, 2468, 2469, 2470, 2471, 2472, 2473, 2474, 2475, 2476, 2477, 2478, 2479, 2480, 2481, 2482, 2483, 2484, 2485, 2486, 2487, 2488, 2489, 2490, 2491, 2492, 2493, 2494, 2495, 2496, 2497, 2498, 2499, 2500, 2501, 2502, 2503, 2504, 2505, 2506, 2507, 2508, 2509, 2510, 2511, 2512, 2513, 2514, 2515, 2516, 2517, 2518, 2519, 2520, 2521, 2522, 2523, 2524, 2525, 2526, 2527, 2528, 2529, 2530, 2531, 2532, 2534, 2535, 2536, 2537, 2538, 2539, 2540, 2541, 2542, 2543, 2544, 2545, 2546, 2547, 2548, 2549, 2550, 2551, 2552, 2553, 2554, 2555, 2556, 2557, 2558, 2559, 2560, 2561, 2562, 2563, 2564, 2565, 2566, 2567, 2568, 2569, 2570, 2571, 2572, 2573, 2574, 2575, 2576, 2577, 2578, 2579, 2580, 2581, 2583, 2584, 2585, 2586, 2587, 2588, 2589, 2590, 2591, 2592, 2593, 2594, 2595, 2596, 2597, 2598, 2599, 2600, 2601, 2602, 2603, 2604, 2605, 2606, 2607, 2608, 2609, 2610, 2611, 2612, 2613, 2614, 2615, 2616, 2617, 2618, 2619, 2620, 2621, 2622, 2623, 2624, 2625, 2626, 2627, 2628, 2629, 2630, 3707, 3708, 10194, 10244, 10246, 10247, 10248, 10249, 10426, 10428, 27602, 27603, 28587, 28588, 28589, 28590, 28591, 28592, 28593, 28594, 28595, 28596, 28597, 28598, 28599, 28600, 28601, 28602, 28603, 28604, 28605, 28606, 28607, 28608, 28609, 28610, 28611, 28612, 28613, 28614, 28615, 28616, 28617, 28618, 28619, 28620, 28621, 28622, 28623, 28624, 28625, 28626, 28627, 28628, 28629, 28630, 28631, 28632, 28633, 28634, 28635, 28636, 28637, 28638, 28639, 28640, 28641, 28642, 28643) )
Put them in a table, create an index on it and then inner join to it instead of using IN

Related

Trying to convert a weirdly nested or structured json dict into a csv or xlsx file

Hi I have this json dict that i would like to simply reproduce in a xlsx or csv file (whatever is easier) but it's just so weirdly structured I have no idea how to format it. This is a snipped of it, it's very long and continues in the same structure:
{'status': {'timestamp': '2022-10-03T11:45:57.639Z', 'error_code': 0, 'error_message': None, 'elapsed': 122, 'credit_count': 25, 'notice': None, 'total_count': 9466}, 'data': [{'id': 1, 'name': 'Bitcoin', 'symbol': 'BTC', 'slug': 'bitcoin', 'num_market_pairs': 9758, 'date_added': '2013-04-28T00:00:00.000Z', 'tags': ['mineable', 'pow', 'sha-256', 'store-of-value', 'state-channel', 'coinbase-ventures-portfolio', 'three-arrows-capital-portfolio', 'polychain-capital-portfolio', 'binance-labs-portfolio', 'blockchain-capital-portfolio', 'boostvc-portfolio', 'cms-holdings-portfolio', 'dcg-portfolio', 'dragonfly-capital-portfolio', 'electric-capital-portfolio', 'fabric-ventures-portfolio', 'framework-ventures-portfolio', 'galaxy-digital-portfolio', 'huobi-capital-portfolio', 'alameda-research-portfolio', 'a16z-portfolio', '1confirmation-portfolio', 'winklevoss-capital-portfolio', 'usv-portfolio', 'placeholder-ventures-portfolio', 'pantera-capital-portfolio', 'multicoin-capital-portfolio', 'paradigm-portfolio'], 'max_supply': 21000000, 'circulating_supply': 19167806, 'total_supply': 19167806, 'platform': None, 'cmc_rank': 1, 'self_reported_circulating_supply': None, 'self_reported_market_cap': None, 'tvl_ratio': None, 'last_updated': '2022-10-03T11:43:00.000Z', 'quote': {'USD': {'price': 19225.658331409155, 'volume_24h': 24499551567.663418, 'volume_change_24h': 31.8917, 'percent_change_1h': 0.17357826, 'percent_change_24h': 0.07206242, 'percent_change_7d': 1.89824678, 'percent_change_30d': -3.09210177, 'percent_change_60d': -16.08415351, 'percent_change_90d': -2.52728996, 'market_cap': 368513689118.7344, 'market_cap_dominance': 39.6701, 'fully_diluted_market_cap': 403738824959.59, 'tvl': None, 'last_updated': '2022-10-03T11:43:00.000Z'}}}, {'id': 1027, 'name': 'Ethereum', 'symbol': 'ETH', 'slug': 'ethereum', 'num_market_pairs': 6121, 'date_added': '2015-08-07T00:00:00.000Z', 'tags': ['pos', 'smart-contracts', 'ethereum-ecosystem', 'coinbase-ventures-portfolio', 'three-arrows-capital-portfolio', 'polychain-capital-portfolio', 'binance-labs-portfolio', 'blockchain-capital-portfolio', 'boostvc-portfolio', 'cms-holdings-portfolio', 'dcg-portfolio', 'dragonfly-capital-portfolio', 'electric-capital-portfolio', 'fabric-ventures-portfolio', 'framework-ventures-portfolio', 'hashkey-capital-portfolio', 'kenetic-capital-portfolio', 'huobi-capital-portfolio', 'alameda-research-portfolio', 'a16z-portfolio', '1confirmation-portfolio', 'winklevoss-capital-portfolio', 'usv-portfolio', 'placeholder-ventures-portfolio', 'pantera-capital-portfolio', 'multicoin-capital-portfolio', 'paradigm-portfolio', 'injective-ecosystem'], 'max_supply': None, 'circulating_supply': 122632957.499, 'total_supply': 122632957.499, 'platform': None, 'cmc_rank': 2, 'self_reported_circulating_supply': None, 'self_reported_market_cap': None, 'tvl_ratio': None, 'last_updated': '2022-10-03T11:43:00.000Z', 'quote': {'USD': {'price': 1296.4468710090778, 'volume_24h': 8517497687.565527, 'volume_change_24h': 23.596, 'percent_change_1h': 0.1720414, 'percent_change_24h': -0.21259957, 'percent_change_7d': 0.14320028, 'percent_change_30d': -16.39161383, 'percent_change_60d': -19.95869375, 'percent_change_90d': 15.00727432, 'market_cap': 158987114032.16776, 'market_cap_dominance': 17.1131, 'fully_diluted_market_cap': 158987114032.17, 'tvl': None, 'last_updated': '2022-10-03T11:43:00.000Z'}}}, {'id': 825, 'name': 'Tether', 'symbol': 'USDT', 'slug': 'tether', 'num_market_pairs': 40432, 'date_added': '2015-02-25T00:00:00.000Z', 'tags': ['payments', 'stablecoin', 'asset-backed-stablecoin', 'avalanche-ecosystem', 'solana-ecosystem', 'arbitrum-ecosytem', 'moonriver-ecosystem', 'injective-ecosystem', 'bnb-chain', 'usd-stablecoin'], 'max_supply': None, 'circulating_supply': 67949424437.85899, 'total_supply': 70155449906.09953, 'platform': .....to be continued
This is all I have:
.....
data = json.loads(response.text)
df = pd.json_normalize(data)
path = "C:\\Users\\NIWE\\Desktop\\Python\\PLS.xlsx"
writer = pd.ExcelWriter(path, engine="xlsxwriter")
df.to_excel(writer)
writer.save()
#writer.close()

Choosing an intercept in lmer

I am using an lmer model (from lmerTest) to understand whether size is significantly correlated with gene expression, and if so, which specific genes are correlated with size (also accounting for 'female' and 'cage' as random effects):
lmer(Expression ~ size*genes + (1|female) + (1|cage), data = df)
In the summary output, one of my genes is being used up as an intercept (since it is highest in the alphabet, 'ctsk'). After reading around, it was recommended that I choose the highest (or lowest) expressed gene as my intercept to compare everything else against. In this case, the gene 'star' was the highest expressed. After re-levelling my data, and re-running the model with 'star' as the intercept, ALL the other slopes are now significant in summary() output, although anova() output is identical.
My questions are:
Is it possible to not have one of my genes used as an intercept? If it is not possible, then how do I know which gene I should choose as an intercept?
Can I test whether the slopes are different from zero? Perhaps this is where I would specify no intercept in my model (i.e. '0+size*genes')?
Is it possible to have the intercept as the mean of all slopes?
I will then use lsmeans to determine whether the slopes are significantly different from each other.
Here is some reproducible code:
df <- structure(list(size = c(13.458, 13.916, 13.356, 13.84, 14.15,
16.4, 15.528, 13.916, 13.458, 13.285, 15.415, 14.181, 13.367,
13.356, 13.947, 14.615, 15.804, 15.528, 16.811, 14.677, 13.2,
17.57, 13.947, 14.15, 16.833, 13.2, 17.254, 16.4, 14.181, 13.367,
14.294, 13.84, 16.833, 17.083, 15.847, 13.399, 14.15, 15.47,
13.356, 14.615, 15.415, 15.596, 15.847, 16.833, 13.285, 15.47,
15.596, 14.181, 13.356, 14.294, 15.415, 15.363, 15.4, 12.851,
17.254, 13.285, 17.57, 14.7, 17.57, 13.947, 16.811, 15.4, 13.399,
14.22, 13.285, 14.344, 17.083, 15.363, 14.677, 15.945), female = structure(c(7L,
12L, 7L, 11L, 12L, 9L, 6L, 12L, 7L, 7L, 6L, 12L, 8L, 7L, 7L,
11L, 9L, 6L, 10L, 11L, 8L, 10L, 7L, 12L, 10L, 8L, 10L, 9L, 12L,
8L, 12L, 11L, 10L, 10L, 9L, 8L, 12L, 6L, 7L, 11L, 6L, 9L, 9L,
10L, 7L, 6L, 9L, 12L, 7L, 12L, 6L, 6L, 6L, 8L, 10L, 7L, 10L,
11L, 10L, 7L, 10L, 6L, 8L, 11L, 7L, 6L, 10L, 6L, 11L, 9L), .Label = c("2",
"3", "6", "10", "11", "16", "18", "24", "25", "28", "30", "31",
"116", "119", "128", "135", "150", "180", "182", "184", "191",
"194", "308", "311", "313", "315", "320", "321", "322", "324",
"325", "329", "339", "342"), class = "factor"), Expression = c(1.10620339407889,
1.06152707257767, 2.03000185674761, 1.92971750056866, 1.30833983462599,
1.02760836165184, 0.960969703469363, 1.54706275342441, 0.314774666283256,
2.63330873720495, 0.895123048920455, 0.917716470037954, 1.3178821021651,
1.57879156856332, 0.633429011784367, 1.12641940390116, 1.0117475796626,
0.687813581350802, 0.923485880847423, 2.98926377892241, 0.547685277701021,
0.967691178046748, 2.04562285257417, 1.09072264997544, 1.57682235413366,
0.967061529758701, 0.941995966023426, 0.299517719292817, 1.8654758451133,
0.651369936708288, 1, 1.04407979584122, 0.799275069735012, 1.007255409328,
0.428129727802404, 0.93927930755046, 0.987394257033815, 0.965050972503591,
2.06719308587322, 1.63846508102874, 0.997380526962644, 0.60270197593643,
2.78682867333149, 0.552922632281237, 3.06702198884562, 0.890708510580522,
1.15168812515828, 0.929205084743164, 2.27254101826041, 1, 0.958147442333527,
1.05924173014089, 0.984356852670054, 0.623630720815415, 0.796864961771971,
2.4679841984147, 1.07248904053777, 1.79630829771291, 0.929642913565982,
0.296954006040077, 2.25741254504115, 1.17188536743493, 0.849778293699644,
2.32679163466857, 0.598119006609413, 0.975660099975423, 1.01494421228949,
1.14007557533352, 2.03638316428189, 0.777347547080068), cage = structure(c(64L,
49L, 56L, 66L, 68L, 48L, 53L, 49L, 64L, 56L, 55L, 68L, 80L, 56L,
64L, 75L, 69L, 53L, 59L, 66L, 63L, 59L, 64L, 68L, 59L, 63L, 50L,
48L, 68L, 80L, 49L, 66L, 59L, 50L, 48L, 63L, 68L, 62L, 56L, 75L,
55L, 81L, 48L, 59L, 56L, 62L, 81L, 68L, 56L, 49L, 55L, 62L, 55L,
63L, 50L, 56L, 59L, 75L, 59L, 64L, 59L, 55L, 63L, 66L, 56L, 53L,
50L, 62L, 66L, 81L), .Label = c("023", "024", "041", "042", "043",
"044", "044 bis", "045", "046", "047", "049", "051", "053", "058",
"060", "061", "068", "070", "071", "111", "112", "113", "123",
"126", "128", "14", "15", "23 bis", "24", "39", "41", "42", "44",
"46 bis", "47", "49", "51", "53", "58", "60", "61", "67", "68",
"70", "75", "76", "9", "D520", "D521", "D522", "D526", "D526bis",
"D533", "D535", "D539", "D544", "D545", "D545bis", "D546", "D561",
"D561bis", "D564", "D570", "D581", "D584", "D586", "L611", "L616",
"L633", "L634", "L635", "L635bis", "L637", "L659", "L673", "L676",
"L686", "L717", "L718", "L720", "L725", "L727", "L727bis"), class = "factor"),
genes = c("igf1", "gr", "ctsk", "ets2", "ctsk", "mtor", "igf1",
"sgk1", "sgk1", "ghr1", "ghr1", "gr", "ctsk", "ets2", "timp2",
"timp2", "ets2", "rictor", "sparc", "mmp9", "gr", "sparc",
"mmp2", "ghr1", "mmp9", "sparc", "mmp2", "timp2", "star",
"sgk1", "mmp2", "gr", "mmp2", "rictor", "timp2", "mmp2",
"mmp2", "mmp2", "mmp2", "rictor", "mtor", "ghr1", "star",
"igf1", "mmp9", "igf1", "igf2", "rictor", "rictor", "mmp9",
"ets2", "ctsk", "mtor", "ghr1", "mtor", "ets2", "ets2", "igf2",
"igf1", "sgk1", "sgk1", "ghr1", "sgk1", "igf2", "star", "mtor",
"igf2", "ghr1", "mmp2", "rictor")), .Names = c("size", "female",
"Expression", "cage", "genes"), row.names = c(1684L, 2674L, 10350L,
11338L, 10379L, 4586L, 1679L, 3637L, 3610L, 5537L, 5530L, 2676L,
10355L, 11313L, 8422L, 8450L, 11322L, 6494L, 9406L, 13262L, 2653L,
9407L, 12274L, 5564L, 13256L, 9394L, 12294L, 8438L, 750L, 3614L,
12303L, 2671L, 12293L, 6513L, 8437L, 12284L, 12305L, 12267L,
12276L, 6524L, 4567L, 5545L, 733L, 1700L, 13241L, 1674L, 7471L,
6528L, 6498L, 13266L, 11308L, 10347L, 4566L, 5541L, 4590L, 11315L,
11333L, 7482L, 1703L, 3607L, 3628L, 5529L, 3617L, 7483L, 722L,
4565L, 7476L, 5532L, 12299L, 6510L), class = "data.frame")
genes <- as.factor(df$genes)
library(lmerTest)
fit1 <- lmer(Expression ~ size * genes +(1|female) + (1|cage), data = df)
anova(fit1)
summary(fit1) # uses the gene 'ctsk' for intercept, so re-level to see what happens if I re-order based on highest value (sgk1):
df$genes <- relevel(genes, "star")
# re-fit the model with 'star' as the intercept:
fit1 <- lmer(Expression ~ size * genes +(1|female) + (1|cage), data = df)
anova(fit1) # no difference here
summary(fit1) # lots of difference
My sample data is pretty long since the model wouldn't run otherwise-hopefully this is ok!
While it is possible to interpret the coefficients in your fitted model, that isn't the most fruitful or productive approach. Instead, just fit the model using whatever contrast methods are used by default, and follow-up with suitable post-hoc analyses.
For that, I suggest using the emmeans (estimated marginal means) package, which is a continuation of lsmeans where all future developments will take place. The package has several vignettes, and the one most relebant to your situation is vignette("interactions"), which you may view here -- particularly the section on interactions with covariates.
Briefly, comparing intercepts can be very misleading, since those are predictions at size = 0 which is an extrapolation; and moreover, as you suggest in a question, the real point here is probably to compare slopes more than intercepts. For that purpose, there is an emtrends() function (or, if you like, its alias lstrends()).
I also strongly recommend displaying a graph of the model predictions so you can visualize what's going on. This may be done via
library(emmeans)
emmip(fit1, gene ~ size, at = list(size = range(df$size)))

un-nest json columns in pandas

I have the code below which reads data in from a json file to a pandas dataframe. Some of the columns like "attributes" still wind up with dicts in them. I'd like them to be columns like "attributes.GoodForMeal.Dessert", similar to what the flatten function from r does.
Can anyone suggest a way to do this in python?
Code:
df_business = pd.read_json('dataset/business.json', lines=True)
print(df_business[1:3])
Data:
address attributes \
1 2824 Milton Rd {u'GoodForMeal': {u'dessert': False, u'latenig...
2 337 Danforth Avenue {u'BusinessParking': {u'garage': False, u'stre...
business_id categories \
1 mLwM-h2YhXl2NCgdS84_Bw [Food, Soul Food, Convenience Stores, Restaura...
2 v2WhjAB3PIBA8J8VxG3wEg [Food, Coffee & Tea]
city hours is_open \
1 Charlotte {u'Monday': u'10:00-22:00', u'Tuesday': u'10:0... 0
2 Toronto {u'Monday': u'10:00-19:00', u'Tuesday': u'10:0... 0
latitude longitude name neighborhood \
1 35.236870 -80.741976 South Florida Style Chicken & Ribs Eastland
2 43.677126 -79.353285 The Tea Emporium Riverdale
postal_code review_count stars state
1 28215 4 4.5 NC
2 M4K 1N7 7 4.5 ON
Update:
from pandas.io.json import json_normalize
print json_normalize('dataset/business.json')
Error:
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-12-bb0ce59acb26> in <module>()
1 from pandas.io.json import json_normalize
----> 2 print json_normalize('dataset/business.json')
/Users/anaconda/lib/python2.7/site-packages/pandas/io/json.pyc in json_normalize(data, record_path, meta, meta_prefix, record_prefix)
791
792 if record_path is None:
--> 793 if any([isinstance(x, dict) for x in compat.itervalues(data[0])]):
794 # naive normalization, this is idempotent for flat records
795 # and potentially will inflate the data considerably for
/Users/anaconda/lib/python2.7/site-packages/pandas/compat/__init__.pyc in itervalues(obj, **kw)
169
170 def itervalues(obj, **kw):
--> 171 return obj.itervalues(**kw)
172
173 next = lambda it : it.next()
AttributeError: 'str' object has no attribute 'itervalues'
Update2:
Code:
import json;
json_normalize(json.load('dataset/business.json'))
Error:
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-20-4fb4bf64efc6> in <module>()
1 import json;
----> 2 json_normalize(json.load('dataset/business.json'))
/Users/anaconda/lib/python2.7/json/__init__.pyc in load(fp, encoding, cls, object_hook, parse_float, parse_int, parse_constant, object_pairs_hook, **kw)
285
286 """
--> 287 return loads(fp.read(),
288 encoding=encoding, cls=cls, object_hook=object_hook,
289 parse_float=parse_float, parse_int=parse_int,
AttributeError: 'str' object has no attribute 'read'
Update3:
Code:
with open('dataset/business.json') as f:
df = json_normalize(json.load(f))
Error:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-21-e3449614f320> in <module>()
1 with open('dataset/business.json') as f:
----> 2 df = json_normalize(json.load(f))
/Users/anaconda/lib/python2.7/json/__init__.pyc in load(fp, encoding, cls, object_hook, parse_float, parse_int, parse_constant, object_pairs_hook, **kw)
289 parse_float=parse_float, parse_int=parse_int,
290 parse_constant=parse_constant, object_pairs_hook=object_pairs_hook,
--> 291 **kw)
292
293
/Users/anaconda/lib/python2.7/json/__init__.pyc in loads(s, encoding, cls, object_hook, parse_float, parse_int, parse_constant, object_pairs_hook, **kw)
337 parse_int is None and parse_float is None and
338 parse_constant is None and object_pairs_hook is None and not kw):
--> 339 return _default_decoder.decode(s)
340 if cls is None:
341 cls = JSONDecoder
/Users/anaconda/lib/python2.7/json/decoder.pyc in decode(self, s, _w)
365 end = _w(s, end).end()
366 if end != len(s):
--> 367 raise ValueError(errmsg("Extra data", s, end, len(s)))
368 return obj
369
ValueError: Extra data: line 2 column 1 - line 156640 column 1 (char 731 - 132272455)
Update4:
Code:
with open('dataset/business.json') as f:
reviews = f.read().strip().split("\n")
reviews = [json.loads(review) for review in reviews]
reviews[1:5]
Sample Data:
[{u'address': u'2824 Milton Rd',
u'attributes': {u'Ambience': {u'casual': False,
u'classy': False,
u'divey': False,
u'hipster': False,
u'intimate': False,
u'romantic': False,
u'touristy': False,
u'trendy': False,
u'upscale': False},
u'BusinessAcceptsCreditCards': False,
u'GoodForKids': True,
u'GoodForMeal': {u'breakfast': False,
u'brunch': False,
u'dessert': False,
u'dinner': False,
u'latenight': False,
u'lunch': False},
u'HasTV': False,
u'NoiseLevel': u'average',
u'OutdoorSeating': False,
u'RestaurantsAttire': u'casual',
u'RestaurantsDelivery': True,
u'RestaurantsGoodForGroups': True,
u'RestaurantsPriceRange2': 2,
u'RestaurantsReservations': False,
u'RestaurantsTakeOut': True},
u'business_id': u'mLwM-h2YhXl2NCgdS84_Bw',
u'categories': [u'Food',
u'Soul Food',
u'Convenience Stores',
u'Restaurants'],
u'city': u'Charlotte',
u'hours': {u'Friday': u'10:00-22:00',
u'Monday': u'10:00-22:00',
u'Saturday': u'10:00-22:00',
u'Sunday': u'10:00-22:00',
u'Thursday': u'10:00-22:00',
u'Tuesday': u'10:00-22:00',
u'Wednesday': u'10:00-22:00'},
u'is_open': 0,
u'latitude': 35.23687,
u'longitude': -80.7419759,
u'name': u'South Florida Style Chicken & Ribs',
u'neighborhood': u'Eastland',
u'postal_code': u'28215',
u'review_count': 4,
u'stars': 4.5,
u'state': u'NC'},
{u'address': u'337 Danforth Avenue',
u'attributes': {u'BikeParking': True,
u'BusinessAcceptsCreditCards': True,
u'BusinessParking': {u'garage': False,
u'lot': False,
u'street': True,
u'valet': False,
u'validated': False},
u'OutdoorSeating': False,
u'RestaurantsPriceRange2': 2,
u'WheelchairAccessible': True,
u'WiFi': u'no'},
u'business_id': u'v2WhjAB3PIBA8J8VxG3wEg',
u'categories': [u'Food', u'Coffee & Tea'],
u'city': u'Toronto',
u'hours': {u'Friday': u'10:00-19:00',
u'Monday': u'10:00-19:00',
u'Saturday': u'10:00-18:00',
u'Sunday': u'12:00-17:00',
u'Thursday': u'10:00-19:00',
u'Tuesday': u'10:00-19:00',
u'Wednesday': u'10:00-19:00'},
u'is_open': 0,
u'latitude': 43.6771258,
u'longitude': -79.3532848,
u'name': u'The Tea Emporium',
u'neighborhood': u'Riverdale',
u'postal_code': u'M4K 1N7',
u'review_count': 7,
u'stars': 4.5,
u'state': u'ON'},
{u'address': u'7702 E Doubletree Ranch Rd, Ste 300',
u'attributes': {},
u'business_id': u'CVtCbSB1zUcUWg-9TNGTuQ',
u'categories': [u'Professional Services', u'Matchmakers'],
u'city': u'Scottsdale',
u'hours': {u'Friday': u'9:00-17:00',
u'Monday': u'9:00-17:00',
u'Thursday': u'9:00-17:00',
u'Tuesday': u'9:00-17:00',
u'Wednesday': u'9:00-17:00'},
u'is_open': 1,
u'latitude': 33.5650816,
u'longitude': -111.9164003,
u'name': u'TRUmatch',
u'neighborhood': u'',
u'postal_code': u'85258',
u'review_count': 3,
u'stars': 3.0,
u'state': u'AZ'},
{u'address': u'4719 N 20Th St',
u'attributes': {u'Alcohol': u'none',
u'Ambience': {u'casual': False,
u'classy': False,
u'divey': False,
u'hipster': False,
u'intimate': False,
u'romantic': False,
u'touristy': False,
u'trendy': False,
u'upscale': False},
u'BikeParking': True,
u'BusinessAcceptsCreditCards': True,
u'BusinessParking': {u'garage': False,
u'lot': False,
u'street': False,
u'valet': False,
u'validated': False},
u'Caters': True,
u'GoodForKids': True,
u'GoodForMeal': {u'breakfast': False,
u'brunch': False,
u'dessert': False,
u'dinner': False,
u'latenight': False,
u'lunch': False},
u'HasTV': False,
u'NoiseLevel': u'quiet',
u'OutdoorSeating': False,
u'RestaurantsAttire': u'casual',
u'RestaurantsDelivery': False,
u'RestaurantsGoodForGroups': True,
u'RestaurantsPriceRange2': 1,
u'RestaurantsReservations': False,
u'RestaurantsTableService': False,
u'RestaurantsTakeOut': True,
u'WiFi': u'no'},
u'business_id': u'duHFBe87uNSXImQmvBh87Q',
u'categories': [u'Sandwiches', u'Restaurants'],
u'city': u'Phoenix',
u'hours': {},
u'is_open': 0,
u'latitude': 33.5059283,
u'longitude': -112.0388474,
u'name': u'Blimpie',
u'neighborhood': u'',
u'postal_code': u'85016',
u'review_count': 10,
u'stars': 4.5,
u'state': u'AZ'}]

Selecting data IN(subquery)

This query works fine:
SELECT *
FROM PRODUKTY WHERE Id_sklep_kategorie IN(669, 670, 671, 672,
673, 674, 683, 686,
687, 688, 689, 690,
691, 692, 693, 694,
695, 696, 697, 698,
699, 700, 701, 845,
846, 847, 848, 849,
850, 851, 898);
But I want to have it more automatically, so I tried this:
SELECT *
FROM PRODUKTY WHERE Id_sklep_kategorie IN(SELECT Id_sklep_kategorie
FROM SKLEP_KATEGORIE);
but it returns me all records...
How can I do this?
It ought to return all the record as you are selecting all the Id_sklep_kategorie from inquery. Now that you have selected all the IDs from inquery, you will get all the records from your whole query. As per my understanding, you need to pass the IDs if you need those specific rows, as you have provided.

ProgrammingError in inserting blob object in MySql using Python

I have a MySQL database and a table with the schema
tweet_id BIGINT
tweet_metadata LONGBLOB
I am trying to insert a row into my database as follows :
import MySQLdb as mysql
host = 'localhost'
user = 'root'
passwd = '************'
db = 'twitter'
insert_tweet_query = ''' INSERT INTO tweets(tweet_id, tweet_metadata) VALUES(%s, %s)'''
''' Creates a MySQL connection and returns the cursor '''
def create_connection():
connection = mysql.connect(host, user, passwd, db,use_unicode=True)
connection.set_character_set('utf8')
cursor = connection.cursor()
cursor.execute('SET NAMES utf8;')
cursor.execute('SET CHARACTER SET utf8;')
cursor.execute('SET character_set_connection=utf8;')
return connection, cursor
''' Close the connection '''
def close_connection(cursor, connection):
cursor.close()
connection.commit()
connection.close()
connection, cursor = create_connection()
tweet = dict({u'contributors': None, u'truncated': False, u'text': u'RT #HMV_Anime: \u7530\u6751\u3086\u304b\u308a\u59eb\u30d9\u30b9\u30c8\u30a2\u30eb\u30d0\u30e0\u300cEverlasting Gift\u300d\u98db\u3076\u3088\u3046\u306b\u58f2\u308c\u3066\u3044\u307e\u3059\uff01\u6728\u66dc\u306f\u6a2a\u30a2\u30ea\u516c\u6f14\uff01\u300c\u30d1\u30fc\u30c6\u30a3\u30fc\u306f\u7d42\u308f\u3089\u306a\u3044\u300d\u306e\u30e9\u30c3\u30d7\u30d1\u30fc\u30c8\u306e\u4e88\u7fd2\u5fa9\u7fd2\u306b\u3082\u5fc5\u9808\u3067\u3059\uff01 http://t.co/SVWm2E1r http://t.co/rSP ...', u'in_reply_to_status_id': None, u'id': 258550064480387072L, u'source': u'ShootingStar', u'retweeted': False, u'coordinates': None, u'entities': {u'user_mentions': [{u'indices': [3, 13], u'id': 147791077, u'id_str': u'147791077', u'screen_name': u'HMV_Anime', u'name': u'HMV\u30a2\u30cb\u30e1\uff01'}], u'hashtags': [], u'urls': [{u'indices': [100, 120], u'url': u'http://t.co/SVWm2E1r', u'expanded_url': u'http://ow.ly/evEvT', u'display_url': u'ow.ly/evEvT'}, {u'indices': [121, 136], u'url': u'http://t.co/rSP', u'expanded_url': u'http://t.co/rSP', u'display_url': u't.co/rSP'}]}, u'in_reply_to_screen_name': None, u'in_reply_to_user_id': None, u'retweet_count': 40, u'id_str': u'258550064480387072', u'favorited': False, u'retweeted_status': {u'contributors': None, u'truncated': False, u'text': u'\u7530\u6751\u3086\u304b\u308a\u59eb\u30d9\u30b9\u30c8\u30a2\u30eb\u30d0\u30e0\u300cEverlasting Gift\u300d\u98db\u3076\u3088\u3046\u306b\u58f2\u308c\u3066\u3044\u307e\u3059\uff01\u6728\u66dc\u306f\u6a2a\u30a2\u30ea\u516c\u6f14\uff01\u300c\u30d1\u30fc\u30c6\u30a3\u30fc\u306f\u7d42\u308f\u3089\u306a\u3044\u300d\u306e\u30e9\u30c3\u30d7\u30d1\u30fc\u30c8\u306e\u4e88\u7fd2\u5fa9\u7fd2\u306b\u3082\u5fc5\u9808\u3067\u3059\uff01 http://t.co/SVWm2E1r http://t.co/rSPYm0bE #yukarin', u'in_reply_to_status_id': None, u'id': 258160273171574784L, u'source': u'HootSuite', u'retweeted': False, u'coordinates': None, u'entities': {u'user_mentions': [], u'hashtags': [{u'indices': [127, 135], u'text': u'yukarin'}], u'urls': [{u'indices': [85, 105], u'url': u'http://t.co/SVWm2E1r', u'expanded_url': u'http://ow.ly/evEvT', u'display_url': u'ow.ly/evEvT'}, {u'indices': [106, 126], u'url': u'http://t.co/rSPYm0bE', u'expanded_url': u'http://twitpic.com/awuzz0', u'display_url': u'twitpic.com/awuzz0'}]}, u'in_reply_to_screen_name': None, u'in_reply_to_user_id': None, u'retweet_count': 40, u'id_str': u'258160273171574784', u'favorited': False, u'user': {u'follow_request_sent': None, u'profile_use_background_image': True, u'id': 147791077, u'verified': False, u'profile_image_url_https': u'https://si0.twimg.com/profile_images/2573283223/mn4nu924bnxh643sgu1p_normal.jpeg', u'profile_sidebar_fill_color': u'DDEEF6', u'geo_enabled': False, u'profile_text_color': u'333333', u'followers_count': 17108, u'profile_sidebar_border_color': u'C0DEED', u'location': u'\u4e03\u68ee\u4e2d\u5b66\u6821', u'default_profile_image': False, u'listed_count': 1012, u'utc_offset': 32400, u'statuses_count': 33277, u'description': u'\u79c1\u3001\u8d64\u5ea7\u3042\u304b\u308a\u3002\u3069\u3053\u306b\u3067\u3082\u3044\u308b\u3054\u304f\u666e\u901a\u306e\u4e2d\u5b66\u751f\u3002\u305d\u3093\u306a\u79c1\u3060\u3051\u3069\u3001\u6bce\u65e5\u3068\u3063\u3066\u3082\u5145\u5b9f\u3057\u3066\u308b\u306e\u3002\u3060\u3063\u3066\u3042\u304b\u308a\u306f\u2026\u2026 \u3060\u3063\u3066\u3042\u304b\u308a\u306f\u2026\u2026\u3000\uff08\u203b\u3053\u3061\u3089\u306f#HMV_Japan\u306e\u59c9\u59b9\u30a2\u30ab\u30a6\u30f3\u30c8\u3067\u3059\u3002\u3054\u8cea\u554f\u30fb\u304a\u554f\u3044\u5408\u308f\u305b\u306f\u3001HMV\u30b5\u30a4\u30c8\u4e0a\u306e\u5c02\u7528\u30d5\u30a9\u30fc\u30e0\u3088\u308a\u304a\u9858\u3044\u81f4\u3057\u307e\u3059\u3002\uff09', u'friends_count': 17046, u'profile_link_color': u'0084B4', u'profile_image_url': u'http://a0.twimg.com/profile_images/2573283223/mn4nu924bnxh643sgu1p_normal.jpeg', u'following': None, u'profile_background_image_url_https': u'https://si0.twimg.com/profile_background_images/104844943/bg_hmv2.gif', u'profile_background_color': u'202020', u'id_str': u'147791077', u'profile_background_image_url': u'http://a0.twimg.com/profile_background_images/104844943/bg_hmv2.gif', u'name': u'HMV\u30a2\u30cb\u30e1\uff01', u'lang': u'ja', u'profile_background_tile': False, u'favourites_count': 0, u'screen_name': u'HMV_Anime', u'notifications': None, u'url': u'http://www.hmv.co.jp/anime/', u'created_at': u'Tue May 25 02:07:35 +0000 2010', u'contributors_enabled': False, u'time_zone': u'Tokyo', u'protected': False, u'default_profile': False, u'is_translator': False}, u'geo': None, u'in_reply_to_user_id_str': None, u'possibly_sensitive': False, u'created_at': u'Tue Oct 16 10:59:40 +0000 2012', u'possibly_sensitive_editable': True, u'in_reply_to_status_id_str': None, u'place': None}, u'user': {u'follow_request_sent': None, u'profile_use_background_image': True, u'id': 500471418, u'verified': False, u'profile_image_url_https': u'https://si0.twimg.com/profile_images/2722246932/b71d269b9e1e16f59698b4f7fa23a0fe_normal.jpeg', u'profile_sidebar_fill_color': u'DDEEF6', u'geo_enabled': False, u'profile_text_color': u'333333', u'followers_count': 2241, u'profile_sidebar_border_color': u'C0DEED', u'location': u'\u3072\u3060\u307e\u308a\u8358204\u53f7\u5ba4', u'default_profile_image': False, u'listed_count': 41, u'utc_offset': 32400, u'statuses_count': 18879, u'description': u'\u611f\u3058\u308d\u2026\u2026\u3002 \u2514(\u2510L \u309c\u03c9\u3002)\u2518\u305d\u3057\u3066\uff71\uff8d\u9854\uff80\uff9e\uff8c\uff9e\uff99\uff8b\uff9f\uff70\uff7d\u3060 \u270c( \u055e\u0a0a \u055e)\u270c \u2026\u2026\uff01 \u3051\u3044\u304a\u3093\u3001\u307e\u3069\u30de\u30ae\u3001AB\u3001\u3089\u304d\u2606\u3059\u305f\u3001\u3086\u308b\u3086\u308a\u3001\u30df\u30eb\u30ad\u30a3\u3068\u304b\u306e\u30a2\u30cb\u30e1\u3001\u6771\u65b9\u3001\u30dc\u30ab\u30ed\u597d\u304d\u3060\u3088\u2517(^\u03c9^ )\u251b\u30c7\u30c7\u30f3\uff01 \u30d5\u30a9\u30ed\u30d0\u306f\u3059\u308b\u304b\u3089\u5f85\u3063\u3068\u3044\u3066 \u53ef\u6190\u3061\u3083\u3093\u540c\u76dfNo.9 \u308c\u3044\u3080\u540c\u76dfNo.4 \u898f\u5236\u57a2\u2192#SpeedPer_2', u'friends_count': 2038, u'profile_link_color': u'0084B4', u'profile_image_url': u'http://a0.twimg.com/profile_images/2722246932/b71d269b9e1e16f59698b4f7fa23a0fe_normal.jpeg', u'following': None, u'profile_background_image_url_https': u'https://si0.twimg.com/profile_background_images/600710368/ff2z5gv4s83u313432hj.jpeg', u'profile_background_color': u'C0DEED', u'id_str': u'500471418', u'profile_background_image_url': u'http://a0.twimg.com/profile_background_images/600710368/ff2z5gv4s83u313432hj.jpeg', u'name': u'\u3055\u30fc\u3057\u3083\u3059#\u30cf\u30cb\u30ab\u30e0\u30ac\u30c1\u52e2', u'lang': u'ja', u'profile_background_tile': True, u'favourites_count': 3066, u'screen_name': u'SpeedPer', u'notifications': None, u'url': u'https://mobile.twitter.com/account', u'created_at': u'Thu Feb 23 05:10:57 +0000 2012', u'contributors_enabled': False, u'time_zone': u'Irkutsk', u'protected': False, u'default_profile': False, u'is_translator': False}, u'geo': None, u'in_reply_to_user_id_str': None, u'possibly_sensitive': False, u'created_at': u'Wed Oct 17 12:48:33 +0000 2012', u'possibly_sensitive_editable': True, u'in_reply_to_status_id_str': None, u'place': None})
cursor.execute(insert_tweet_query, (tweet['id_str'], tweet))
close_connection(cursor, connection)
However, despite setting appropriate 'UTF-8' encodings I get an exception as follows
_mysql_exceptions.ProgrammingError: (1064, 'You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near \': \'NULL\', u\'truncated\': \'0\', u\'text\': "\'RT #HMV_Anime: \\xe7\\x94\\xb0\\xe6\\x9d\\x91\\\' at line 1')
What am I doing wrong?
you could try with repr:
cursor.execute(insert_tweet_query, (tweet['id_str'], repr(tweet)))