Hi I have this json dict that i would like to simply reproduce in a xlsx or csv file (whatever is easier) but it's just so weirdly structured I have no idea how to format it. This is a snipped of it, it's very long and continues in the same structure:
{'status': {'timestamp': '2022-10-03T11:45:57.639Z', 'error_code': 0, 'error_message': None, 'elapsed': 122, 'credit_count': 25, 'notice': None, 'total_count': 9466}, 'data': [{'id': 1, 'name': 'Bitcoin', 'symbol': 'BTC', 'slug': 'bitcoin', 'num_market_pairs': 9758, 'date_added': '2013-04-28T00:00:00.000Z', 'tags': ['mineable', 'pow', 'sha-256', 'store-of-value', 'state-channel', 'coinbase-ventures-portfolio', 'three-arrows-capital-portfolio', 'polychain-capital-portfolio', 'binance-labs-portfolio', 'blockchain-capital-portfolio', 'boostvc-portfolio', 'cms-holdings-portfolio', 'dcg-portfolio', 'dragonfly-capital-portfolio', 'electric-capital-portfolio', 'fabric-ventures-portfolio', 'framework-ventures-portfolio', 'galaxy-digital-portfolio', 'huobi-capital-portfolio', 'alameda-research-portfolio', 'a16z-portfolio', '1confirmation-portfolio', 'winklevoss-capital-portfolio', 'usv-portfolio', 'placeholder-ventures-portfolio', 'pantera-capital-portfolio', 'multicoin-capital-portfolio', 'paradigm-portfolio'], 'max_supply': 21000000, 'circulating_supply': 19167806, 'total_supply': 19167806, 'platform': None, 'cmc_rank': 1, 'self_reported_circulating_supply': None, 'self_reported_market_cap': None, 'tvl_ratio': None, 'last_updated': '2022-10-03T11:43:00.000Z', 'quote': {'USD': {'price': 19225.658331409155, 'volume_24h': 24499551567.663418, 'volume_change_24h': 31.8917, 'percent_change_1h': 0.17357826, 'percent_change_24h': 0.07206242, 'percent_change_7d': 1.89824678, 'percent_change_30d': -3.09210177, 'percent_change_60d': -16.08415351, 'percent_change_90d': -2.52728996, 'market_cap': 368513689118.7344, 'market_cap_dominance': 39.6701, 'fully_diluted_market_cap': 403738824959.59, 'tvl': None, 'last_updated': '2022-10-03T11:43:00.000Z'}}}, {'id': 1027, 'name': 'Ethereum', 'symbol': 'ETH', 'slug': 'ethereum', 'num_market_pairs': 6121, 'date_added': '2015-08-07T00:00:00.000Z', 'tags': ['pos', 'smart-contracts', 'ethereum-ecosystem', 'coinbase-ventures-portfolio', 'three-arrows-capital-portfolio', 'polychain-capital-portfolio', 'binance-labs-portfolio', 'blockchain-capital-portfolio', 'boostvc-portfolio', 'cms-holdings-portfolio', 'dcg-portfolio', 'dragonfly-capital-portfolio', 'electric-capital-portfolio', 'fabric-ventures-portfolio', 'framework-ventures-portfolio', 'hashkey-capital-portfolio', 'kenetic-capital-portfolio', 'huobi-capital-portfolio', 'alameda-research-portfolio', 'a16z-portfolio', '1confirmation-portfolio', 'winklevoss-capital-portfolio', 'usv-portfolio', 'placeholder-ventures-portfolio', 'pantera-capital-portfolio', 'multicoin-capital-portfolio', 'paradigm-portfolio', 'injective-ecosystem'], 'max_supply': None, 'circulating_supply': 122632957.499, 'total_supply': 122632957.499, 'platform': None, 'cmc_rank': 2, 'self_reported_circulating_supply': None, 'self_reported_market_cap': None, 'tvl_ratio': None, 'last_updated': '2022-10-03T11:43:00.000Z', 'quote': {'USD': {'price': 1296.4468710090778, 'volume_24h': 8517497687.565527, 'volume_change_24h': 23.596, 'percent_change_1h': 0.1720414, 'percent_change_24h': -0.21259957, 'percent_change_7d': 0.14320028, 'percent_change_30d': -16.39161383, 'percent_change_60d': -19.95869375, 'percent_change_90d': 15.00727432, 'market_cap': 158987114032.16776, 'market_cap_dominance': 17.1131, 'fully_diluted_market_cap': 158987114032.17, 'tvl': None, 'last_updated': '2022-10-03T11:43:00.000Z'}}}, {'id': 825, 'name': 'Tether', 'symbol': 'USDT', 'slug': 'tether', 'num_market_pairs': 40432, 'date_added': '2015-02-25T00:00:00.000Z', 'tags': ['payments', 'stablecoin', 'asset-backed-stablecoin', 'avalanche-ecosystem', 'solana-ecosystem', 'arbitrum-ecosytem', 'moonriver-ecosystem', 'injective-ecosystem', 'bnb-chain', 'usd-stablecoin'], 'max_supply': None, 'circulating_supply': 67949424437.85899, 'total_supply': 70155449906.09953, 'platform': .....to be continued
This is all I have:
.....
data = json.loads(response.text)
df = pd.json_normalize(data)
path = "C:\\Users\\NIWE\\Desktop\\Python\\PLS.xlsx"
writer = pd.ExcelWriter(path, engine="xlsxwriter")
df.to_excel(writer)
writer.save()
#writer.close()
I am using an lmer model (from lmerTest) to understand whether size is significantly correlated with gene expression, and if so, which specific genes are correlated with size (also accounting for 'female' and 'cage' as random effects):
lmer(Expression ~ size*genes + (1|female) + (1|cage), data = df)
In the summary output, one of my genes is being used up as an intercept (since it is highest in the alphabet, 'ctsk'). After reading around, it was recommended that I choose the highest (or lowest) expressed gene as my intercept to compare everything else against. In this case, the gene 'star' was the highest expressed. After re-levelling my data, and re-running the model with 'star' as the intercept, ALL the other slopes are now significant in summary() output, although anova() output is identical.
My questions are:
Is it possible to not have one of my genes used as an intercept? If it is not possible, then how do I know which gene I should choose as an intercept?
Can I test whether the slopes are different from zero? Perhaps this is where I would specify no intercept in my model (i.e. '0+size*genes')?
Is it possible to have the intercept as the mean of all slopes?
I will then use lsmeans to determine whether the slopes are significantly different from each other.
Here is some reproducible code:
df <- structure(list(size = c(13.458, 13.916, 13.356, 13.84, 14.15,
16.4, 15.528, 13.916, 13.458, 13.285, 15.415, 14.181, 13.367,
13.356, 13.947, 14.615, 15.804, 15.528, 16.811, 14.677, 13.2,
17.57, 13.947, 14.15, 16.833, 13.2, 17.254, 16.4, 14.181, 13.367,
14.294, 13.84, 16.833, 17.083, 15.847, 13.399, 14.15, 15.47,
13.356, 14.615, 15.415, 15.596, 15.847, 16.833, 13.285, 15.47,
15.596, 14.181, 13.356, 14.294, 15.415, 15.363, 15.4, 12.851,
17.254, 13.285, 17.57, 14.7, 17.57, 13.947, 16.811, 15.4, 13.399,
14.22, 13.285, 14.344, 17.083, 15.363, 14.677, 15.945), female = structure(c(7L,
12L, 7L, 11L, 12L, 9L, 6L, 12L, 7L, 7L, 6L, 12L, 8L, 7L, 7L,
11L, 9L, 6L, 10L, 11L, 8L, 10L, 7L, 12L, 10L, 8L, 10L, 9L, 12L,
8L, 12L, 11L, 10L, 10L, 9L, 8L, 12L, 6L, 7L, 11L, 6L, 9L, 9L,
10L, 7L, 6L, 9L, 12L, 7L, 12L, 6L, 6L, 6L, 8L, 10L, 7L, 10L,
11L, 10L, 7L, 10L, 6L, 8L, 11L, 7L, 6L, 10L, 6L, 11L, 9L), .Label = c("2",
"3", "6", "10", "11", "16", "18", "24", "25", "28", "30", "31",
"116", "119", "128", "135", "150", "180", "182", "184", "191",
"194", "308", "311", "313", "315", "320", "321", "322", "324",
"325", "329", "339", "342"), class = "factor"), Expression = c(1.10620339407889,
1.06152707257767, 2.03000185674761, 1.92971750056866, 1.30833983462599,
1.02760836165184, 0.960969703469363, 1.54706275342441, 0.314774666283256,
2.63330873720495, 0.895123048920455, 0.917716470037954, 1.3178821021651,
1.57879156856332, 0.633429011784367, 1.12641940390116, 1.0117475796626,
0.687813581350802, 0.923485880847423, 2.98926377892241, 0.547685277701021,
0.967691178046748, 2.04562285257417, 1.09072264997544, 1.57682235413366,
0.967061529758701, 0.941995966023426, 0.299517719292817, 1.8654758451133,
0.651369936708288, 1, 1.04407979584122, 0.799275069735012, 1.007255409328,
0.428129727802404, 0.93927930755046, 0.987394257033815, 0.965050972503591,
2.06719308587322, 1.63846508102874, 0.997380526962644, 0.60270197593643,
2.78682867333149, 0.552922632281237, 3.06702198884562, 0.890708510580522,
1.15168812515828, 0.929205084743164, 2.27254101826041, 1, 0.958147442333527,
1.05924173014089, 0.984356852670054, 0.623630720815415, 0.796864961771971,
2.4679841984147, 1.07248904053777, 1.79630829771291, 0.929642913565982,
0.296954006040077, 2.25741254504115, 1.17188536743493, 0.849778293699644,
2.32679163466857, 0.598119006609413, 0.975660099975423, 1.01494421228949,
1.14007557533352, 2.03638316428189, 0.777347547080068), cage = structure(c(64L,
49L, 56L, 66L, 68L, 48L, 53L, 49L, 64L, 56L, 55L, 68L, 80L, 56L,
64L, 75L, 69L, 53L, 59L, 66L, 63L, 59L, 64L, 68L, 59L, 63L, 50L,
48L, 68L, 80L, 49L, 66L, 59L, 50L, 48L, 63L, 68L, 62L, 56L, 75L,
55L, 81L, 48L, 59L, 56L, 62L, 81L, 68L, 56L, 49L, 55L, 62L, 55L,
63L, 50L, 56L, 59L, 75L, 59L, 64L, 59L, 55L, 63L, 66L, 56L, 53L,
50L, 62L, 66L, 81L), .Label = c("023", "024", "041", "042", "043",
"044", "044 bis", "045", "046", "047", "049", "051", "053", "058",
"060", "061", "068", "070", "071", "111", "112", "113", "123",
"126", "128", "14", "15", "23 bis", "24", "39", "41", "42", "44",
"46 bis", "47", "49", "51", "53", "58", "60", "61", "67", "68",
"70", "75", "76", "9", "D520", "D521", "D522", "D526", "D526bis",
"D533", "D535", "D539", "D544", "D545", "D545bis", "D546", "D561",
"D561bis", "D564", "D570", "D581", "D584", "D586", "L611", "L616",
"L633", "L634", "L635", "L635bis", "L637", "L659", "L673", "L676",
"L686", "L717", "L718", "L720", "L725", "L727", "L727bis"), class = "factor"),
genes = c("igf1", "gr", "ctsk", "ets2", "ctsk", "mtor", "igf1",
"sgk1", "sgk1", "ghr1", "ghr1", "gr", "ctsk", "ets2", "timp2",
"timp2", "ets2", "rictor", "sparc", "mmp9", "gr", "sparc",
"mmp2", "ghr1", "mmp9", "sparc", "mmp2", "timp2", "star",
"sgk1", "mmp2", "gr", "mmp2", "rictor", "timp2", "mmp2",
"mmp2", "mmp2", "mmp2", "rictor", "mtor", "ghr1", "star",
"igf1", "mmp9", "igf1", "igf2", "rictor", "rictor", "mmp9",
"ets2", "ctsk", "mtor", "ghr1", "mtor", "ets2", "ets2", "igf2",
"igf1", "sgk1", "sgk1", "ghr1", "sgk1", "igf2", "star", "mtor",
"igf2", "ghr1", "mmp2", "rictor")), .Names = c("size", "female",
"Expression", "cage", "genes"), row.names = c(1684L, 2674L, 10350L,
11338L, 10379L, 4586L, 1679L, 3637L, 3610L, 5537L, 5530L, 2676L,
10355L, 11313L, 8422L, 8450L, 11322L, 6494L, 9406L, 13262L, 2653L,
9407L, 12274L, 5564L, 13256L, 9394L, 12294L, 8438L, 750L, 3614L,
12303L, 2671L, 12293L, 6513L, 8437L, 12284L, 12305L, 12267L,
12276L, 6524L, 4567L, 5545L, 733L, 1700L, 13241L, 1674L, 7471L,
6528L, 6498L, 13266L, 11308L, 10347L, 4566L, 5541L, 4590L, 11315L,
11333L, 7482L, 1703L, 3607L, 3628L, 5529L, 3617L, 7483L, 722L,
4565L, 7476L, 5532L, 12299L, 6510L), class = "data.frame")
genes <- as.factor(df$genes)
library(lmerTest)
fit1 <- lmer(Expression ~ size * genes +(1|female) + (1|cage), data = df)
anova(fit1)
summary(fit1) # uses the gene 'ctsk' for intercept, so re-level to see what happens if I re-order based on highest value (sgk1):
df$genes <- relevel(genes, "star")
# re-fit the model with 'star' as the intercept:
fit1 <- lmer(Expression ~ size * genes +(1|female) + (1|cage), data = df)
anova(fit1) # no difference here
summary(fit1) # lots of difference
My sample data is pretty long since the model wouldn't run otherwise-hopefully this is ok!
While it is possible to interpret the coefficients in your fitted model, that isn't the most fruitful or productive approach. Instead, just fit the model using whatever contrast methods are used by default, and follow-up with suitable post-hoc analyses.
For that, I suggest using the emmeans (estimated marginal means) package, which is a continuation of lsmeans where all future developments will take place. The package has several vignettes, and the one most relebant to your situation is vignette("interactions"), which you may view here -- particularly the section on interactions with covariates.
Briefly, comparing intercepts can be very misleading, since those are predictions at size = 0 which is an extrapolation; and moreover, as you suggest in a question, the real point here is probably to compare slopes more than intercepts. For that purpose, there is an emtrends() function (or, if you like, its alias lstrends()).
I also strongly recommend displaying a graph of the model predictions so you can visualize what's going on. This may be done via
library(emmeans)
emmip(fit1, gene ~ size, at = list(size = range(df$size)))
I have the code below which reads data in from a json file to a pandas dataframe. Some of the columns like "attributes" still wind up with dicts in them. I'd like them to be columns like "attributes.GoodForMeal.Dessert", similar to what the flatten function from r does.
Can anyone suggest a way to do this in python?
Code:
df_business = pd.read_json('dataset/business.json', lines=True)
print(df_business[1:3])
Data:
address attributes \
1 2824 Milton Rd {u'GoodForMeal': {u'dessert': False, u'latenig...
2 337 Danforth Avenue {u'BusinessParking': {u'garage': False, u'stre...
business_id categories \
1 mLwM-h2YhXl2NCgdS84_Bw [Food, Soul Food, Convenience Stores, Restaura...
2 v2WhjAB3PIBA8J8VxG3wEg [Food, Coffee & Tea]
city hours is_open \
1 Charlotte {u'Monday': u'10:00-22:00', u'Tuesday': u'10:0... 0
2 Toronto {u'Monday': u'10:00-19:00', u'Tuesday': u'10:0... 0
latitude longitude name neighborhood \
1 35.236870 -80.741976 South Florida Style Chicken & Ribs Eastland
2 43.677126 -79.353285 The Tea Emporium Riverdale
postal_code review_count stars state
1 28215 4 4.5 NC
2 M4K 1N7 7 4.5 ON
Update:
from pandas.io.json import json_normalize
print json_normalize('dataset/business.json')
Error:
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-12-bb0ce59acb26> in <module>()
1 from pandas.io.json import json_normalize
----> 2 print json_normalize('dataset/business.json')
/Users/anaconda/lib/python2.7/site-packages/pandas/io/json.pyc in json_normalize(data, record_path, meta, meta_prefix, record_prefix)
791
792 if record_path is None:
--> 793 if any([isinstance(x, dict) for x in compat.itervalues(data[0])]):
794 # naive normalization, this is idempotent for flat records
795 # and potentially will inflate the data considerably for
/Users/anaconda/lib/python2.7/site-packages/pandas/compat/__init__.pyc in itervalues(obj, **kw)
169
170 def itervalues(obj, **kw):
--> 171 return obj.itervalues(**kw)
172
173 next = lambda it : it.next()
AttributeError: 'str' object has no attribute 'itervalues'
Update2:
Code:
import json;
json_normalize(json.load('dataset/business.json'))
Error:
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-20-4fb4bf64efc6> in <module>()
1 import json;
----> 2 json_normalize(json.load('dataset/business.json'))
/Users/anaconda/lib/python2.7/json/__init__.pyc in load(fp, encoding, cls, object_hook, parse_float, parse_int, parse_constant, object_pairs_hook, **kw)
285
286 """
--> 287 return loads(fp.read(),
288 encoding=encoding, cls=cls, object_hook=object_hook,
289 parse_float=parse_float, parse_int=parse_int,
AttributeError: 'str' object has no attribute 'read'
Update3:
Code:
with open('dataset/business.json') as f:
df = json_normalize(json.load(f))
Error:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-21-e3449614f320> in <module>()
1 with open('dataset/business.json') as f:
----> 2 df = json_normalize(json.load(f))
/Users/anaconda/lib/python2.7/json/__init__.pyc in load(fp, encoding, cls, object_hook, parse_float, parse_int, parse_constant, object_pairs_hook, **kw)
289 parse_float=parse_float, parse_int=parse_int,
290 parse_constant=parse_constant, object_pairs_hook=object_pairs_hook,
--> 291 **kw)
292
293
/Users/anaconda/lib/python2.7/json/__init__.pyc in loads(s, encoding, cls, object_hook, parse_float, parse_int, parse_constant, object_pairs_hook, **kw)
337 parse_int is None and parse_float is None and
338 parse_constant is None and object_pairs_hook is None and not kw):
--> 339 return _default_decoder.decode(s)
340 if cls is None:
341 cls = JSONDecoder
/Users/anaconda/lib/python2.7/json/decoder.pyc in decode(self, s, _w)
365 end = _w(s, end).end()
366 if end != len(s):
--> 367 raise ValueError(errmsg("Extra data", s, end, len(s)))
368 return obj
369
ValueError: Extra data: line 2 column 1 - line 156640 column 1 (char 731 - 132272455)
Update4:
Code:
with open('dataset/business.json') as f:
reviews = f.read().strip().split("\n")
reviews = [json.loads(review) for review in reviews]
reviews[1:5]
Sample Data:
[{u'address': u'2824 Milton Rd',
u'attributes': {u'Ambience': {u'casual': False,
u'classy': False,
u'divey': False,
u'hipster': False,
u'intimate': False,
u'romantic': False,
u'touristy': False,
u'trendy': False,
u'upscale': False},
u'BusinessAcceptsCreditCards': False,
u'GoodForKids': True,
u'GoodForMeal': {u'breakfast': False,
u'brunch': False,
u'dessert': False,
u'dinner': False,
u'latenight': False,
u'lunch': False},
u'HasTV': False,
u'NoiseLevel': u'average',
u'OutdoorSeating': False,
u'RestaurantsAttire': u'casual',
u'RestaurantsDelivery': True,
u'RestaurantsGoodForGroups': True,
u'RestaurantsPriceRange2': 2,
u'RestaurantsReservations': False,
u'RestaurantsTakeOut': True},
u'business_id': u'mLwM-h2YhXl2NCgdS84_Bw',
u'categories': [u'Food',
u'Soul Food',
u'Convenience Stores',
u'Restaurants'],
u'city': u'Charlotte',
u'hours': {u'Friday': u'10:00-22:00',
u'Monday': u'10:00-22:00',
u'Saturday': u'10:00-22:00',
u'Sunday': u'10:00-22:00',
u'Thursday': u'10:00-22:00',
u'Tuesday': u'10:00-22:00',
u'Wednesday': u'10:00-22:00'},
u'is_open': 0,
u'latitude': 35.23687,
u'longitude': -80.7419759,
u'name': u'South Florida Style Chicken & Ribs',
u'neighborhood': u'Eastland',
u'postal_code': u'28215',
u'review_count': 4,
u'stars': 4.5,
u'state': u'NC'},
{u'address': u'337 Danforth Avenue',
u'attributes': {u'BikeParking': True,
u'BusinessAcceptsCreditCards': True,
u'BusinessParking': {u'garage': False,
u'lot': False,
u'street': True,
u'valet': False,
u'validated': False},
u'OutdoorSeating': False,
u'RestaurantsPriceRange2': 2,
u'WheelchairAccessible': True,
u'WiFi': u'no'},
u'business_id': u'v2WhjAB3PIBA8J8VxG3wEg',
u'categories': [u'Food', u'Coffee & Tea'],
u'city': u'Toronto',
u'hours': {u'Friday': u'10:00-19:00',
u'Monday': u'10:00-19:00',
u'Saturday': u'10:00-18:00',
u'Sunday': u'12:00-17:00',
u'Thursday': u'10:00-19:00',
u'Tuesday': u'10:00-19:00',
u'Wednesday': u'10:00-19:00'},
u'is_open': 0,
u'latitude': 43.6771258,
u'longitude': -79.3532848,
u'name': u'The Tea Emporium',
u'neighborhood': u'Riverdale',
u'postal_code': u'M4K 1N7',
u'review_count': 7,
u'stars': 4.5,
u'state': u'ON'},
{u'address': u'7702 E Doubletree Ranch Rd, Ste 300',
u'attributes': {},
u'business_id': u'CVtCbSB1zUcUWg-9TNGTuQ',
u'categories': [u'Professional Services', u'Matchmakers'],
u'city': u'Scottsdale',
u'hours': {u'Friday': u'9:00-17:00',
u'Monday': u'9:00-17:00',
u'Thursday': u'9:00-17:00',
u'Tuesday': u'9:00-17:00',
u'Wednesday': u'9:00-17:00'},
u'is_open': 1,
u'latitude': 33.5650816,
u'longitude': -111.9164003,
u'name': u'TRUmatch',
u'neighborhood': u'',
u'postal_code': u'85258',
u'review_count': 3,
u'stars': 3.0,
u'state': u'AZ'},
{u'address': u'4719 N 20Th St',
u'attributes': {u'Alcohol': u'none',
u'Ambience': {u'casual': False,
u'classy': False,
u'divey': False,
u'hipster': False,
u'intimate': False,
u'romantic': False,
u'touristy': False,
u'trendy': False,
u'upscale': False},
u'BikeParking': True,
u'BusinessAcceptsCreditCards': True,
u'BusinessParking': {u'garage': False,
u'lot': False,
u'street': False,
u'valet': False,
u'validated': False},
u'Caters': True,
u'GoodForKids': True,
u'GoodForMeal': {u'breakfast': False,
u'brunch': False,
u'dessert': False,
u'dinner': False,
u'latenight': False,
u'lunch': False},
u'HasTV': False,
u'NoiseLevel': u'quiet',
u'OutdoorSeating': False,
u'RestaurantsAttire': u'casual',
u'RestaurantsDelivery': False,
u'RestaurantsGoodForGroups': True,
u'RestaurantsPriceRange2': 1,
u'RestaurantsReservations': False,
u'RestaurantsTableService': False,
u'RestaurantsTakeOut': True,
u'WiFi': u'no'},
u'business_id': u'duHFBe87uNSXImQmvBh87Q',
u'categories': [u'Sandwiches', u'Restaurants'],
u'city': u'Phoenix',
u'hours': {},
u'is_open': 0,
u'latitude': 33.5059283,
u'longitude': -112.0388474,
u'name': u'Blimpie',
u'neighborhood': u'',
u'postal_code': u'85016',
u'review_count': 10,
u'stars': 4.5,
u'state': u'AZ'}]
I have a MySQL database and a table with the schema
tweet_id BIGINT
tweet_metadata LONGBLOB
I am trying to insert a row into my database as follows :
import MySQLdb as mysql
host = 'localhost'
user = 'root'
passwd = '************'
db = 'twitter'
insert_tweet_query = ''' INSERT INTO tweets(tweet_id, tweet_metadata) VALUES(%s, %s)'''
''' Creates a MySQL connection and returns the cursor '''
def create_connection():
connection = mysql.connect(host, user, passwd, db,use_unicode=True)
connection.set_character_set('utf8')
cursor = connection.cursor()
cursor.execute('SET NAMES utf8;')
cursor.execute('SET CHARACTER SET utf8;')
cursor.execute('SET character_set_connection=utf8;')
return connection, cursor
''' Close the connection '''
def close_connection(cursor, connection):
cursor.close()
connection.commit()
connection.close()
connection, cursor = create_connection()
tweet = dict({u'contributors': None, u'truncated': False, u'text': u'RT #HMV_Anime: \u7530\u6751\u3086\u304b\u308a\u59eb\u30d9\u30b9\u30c8\u30a2\u30eb\u30d0\u30e0\u300cEverlasting Gift\u300d\u98db\u3076\u3088\u3046\u306b\u58f2\u308c\u3066\u3044\u307e\u3059\uff01\u6728\u66dc\u306f\u6a2a\u30a2\u30ea\u516c\u6f14\uff01\u300c\u30d1\u30fc\u30c6\u30a3\u30fc\u306f\u7d42\u308f\u3089\u306a\u3044\u300d\u306e\u30e9\u30c3\u30d7\u30d1\u30fc\u30c8\u306e\u4e88\u7fd2\u5fa9\u7fd2\u306b\u3082\u5fc5\u9808\u3067\u3059\uff01 http://t.co/SVWm2E1r http://t.co/rSP ...', u'in_reply_to_status_id': None, u'id': 258550064480387072L, u'source': u'ShootingStar', u'retweeted': False, u'coordinates': None, u'entities': {u'user_mentions': [{u'indices': [3, 13], u'id': 147791077, u'id_str': u'147791077', u'screen_name': u'HMV_Anime', u'name': u'HMV\u30a2\u30cb\u30e1\uff01'}], u'hashtags': [], u'urls': [{u'indices': [100, 120], u'url': u'http://t.co/SVWm2E1r', u'expanded_url': u'http://ow.ly/evEvT', u'display_url': u'ow.ly/evEvT'}, {u'indices': [121, 136], u'url': u'http://t.co/rSP', u'expanded_url': u'http://t.co/rSP', u'display_url': u't.co/rSP'}]}, u'in_reply_to_screen_name': None, u'in_reply_to_user_id': None, u'retweet_count': 40, u'id_str': u'258550064480387072', u'favorited': False, u'retweeted_status': {u'contributors': None, u'truncated': False, u'text': u'\u7530\u6751\u3086\u304b\u308a\u59eb\u30d9\u30b9\u30c8\u30a2\u30eb\u30d0\u30e0\u300cEverlasting Gift\u300d\u98db\u3076\u3088\u3046\u306b\u58f2\u308c\u3066\u3044\u307e\u3059\uff01\u6728\u66dc\u306f\u6a2a\u30a2\u30ea\u516c\u6f14\uff01\u300c\u30d1\u30fc\u30c6\u30a3\u30fc\u306f\u7d42\u308f\u3089\u306a\u3044\u300d\u306e\u30e9\u30c3\u30d7\u30d1\u30fc\u30c8\u306e\u4e88\u7fd2\u5fa9\u7fd2\u306b\u3082\u5fc5\u9808\u3067\u3059\uff01 http://t.co/SVWm2E1r http://t.co/rSPYm0bE #yukarin', u'in_reply_to_status_id': None, u'id': 258160273171574784L, u'source': u'HootSuite', u'retweeted': False, u'coordinates': None, u'entities': {u'user_mentions': [], u'hashtags': [{u'indices': [127, 135], u'text': u'yukarin'}], u'urls': [{u'indices': [85, 105], u'url': u'http://t.co/SVWm2E1r', u'expanded_url': u'http://ow.ly/evEvT', u'display_url': u'ow.ly/evEvT'}, {u'indices': [106, 126], u'url': u'http://t.co/rSPYm0bE', u'expanded_url': u'http://twitpic.com/awuzz0', u'display_url': u'twitpic.com/awuzz0'}]}, u'in_reply_to_screen_name': None, u'in_reply_to_user_id': None, u'retweet_count': 40, u'id_str': u'258160273171574784', u'favorited': False, u'user': {u'follow_request_sent': None, u'profile_use_background_image': True, u'id': 147791077, u'verified': False, u'profile_image_url_https': u'https://si0.twimg.com/profile_images/2573283223/mn4nu924bnxh643sgu1p_normal.jpeg', u'profile_sidebar_fill_color': u'DDEEF6', u'geo_enabled': False, u'profile_text_color': u'333333', u'followers_count': 17108, u'profile_sidebar_border_color': u'C0DEED', u'location': u'\u4e03\u68ee\u4e2d\u5b66\u6821', u'default_profile_image': False, u'listed_count': 1012, u'utc_offset': 32400, u'statuses_count': 33277, u'description': u'\u79c1\u3001\u8d64\u5ea7\u3042\u304b\u308a\u3002\u3069\u3053\u306b\u3067\u3082\u3044\u308b\u3054\u304f\u666e\u901a\u306e\u4e2d\u5b66\u751f\u3002\u305d\u3093\u306a\u79c1\u3060\u3051\u3069\u3001\u6bce\u65e5\u3068\u3063\u3066\u3082\u5145\u5b9f\u3057\u3066\u308b\u306e\u3002\u3060\u3063\u3066\u3042\u304b\u308a\u306f\u2026\u2026 \u3060\u3063\u3066\u3042\u304b\u308a\u306f\u2026\u2026\u3000\uff08\u203b\u3053\u3061\u3089\u306f#HMV_Japan\u306e\u59c9\u59b9\u30a2\u30ab\u30a6\u30f3\u30c8\u3067\u3059\u3002\u3054\u8cea\u554f\u30fb\u304a\u554f\u3044\u5408\u308f\u305b\u306f\u3001HMV\u30b5\u30a4\u30c8\u4e0a\u306e\u5c02\u7528\u30d5\u30a9\u30fc\u30e0\u3088\u308a\u304a\u9858\u3044\u81f4\u3057\u307e\u3059\u3002\uff09', u'friends_count': 17046, u'profile_link_color': u'0084B4', u'profile_image_url': u'http://a0.twimg.com/profile_images/2573283223/mn4nu924bnxh643sgu1p_normal.jpeg', u'following': None, u'profile_background_image_url_https': u'https://si0.twimg.com/profile_background_images/104844943/bg_hmv2.gif', u'profile_background_color': u'202020', u'id_str': u'147791077', u'profile_background_image_url': u'http://a0.twimg.com/profile_background_images/104844943/bg_hmv2.gif', u'name': u'HMV\u30a2\u30cb\u30e1\uff01', u'lang': u'ja', u'profile_background_tile': False, u'favourites_count': 0, u'screen_name': u'HMV_Anime', u'notifications': None, u'url': u'http://www.hmv.co.jp/anime/', u'created_at': u'Tue May 25 02:07:35 +0000 2010', u'contributors_enabled': False, u'time_zone': u'Tokyo', u'protected': False, u'default_profile': False, u'is_translator': False}, u'geo': None, u'in_reply_to_user_id_str': None, u'possibly_sensitive': False, u'created_at': u'Tue Oct 16 10:59:40 +0000 2012', u'possibly_sensitive_editable': True, u'in_reply_to_status_id_str': None, u'place': None}, u'user': {u'follow_request_sent': None, u'profile_use_background_image': True, u'id': 500471418, u'verified': False, u'profile_image_url_https': u'https://si0.twimg.com/profile_images/2722246932/b71d269b9e1e16f59698b4f7fa23a0fe_normal.jpeg', u'profile_sidebar_fill_color': u'DDEEF6', u'geo_enabled': False, u'profile_text_color': u'333333', u'followers_count': 2241, u'profile_sidebar_border_color': u'C0DEED', u'location': u'\u3072\u3060\u307e\u308a\u8358204\u53f7\u5ba4', u'default_profile_image': False, u'listed_count': 41, u'utc_offset': 32400, u'statuses_count': 18879, u'description': u'\u611f\u3058\u308d\u2026\u2026\u3002 \u2514(\u2510L \u309c\u03c9\u3002)\u2518\u305d\u3057\u3066\uff71\uff8d\u9854\uff80\uff9e\uff8c\uff9e\uff99\uff8b\uff9f\uff70\uff7d\u3060 \u270c( \u055e\u0a0a \u055e)\u270c \u2026\u2026\uff01 \u3051\u3044\u304a\u3093\u3001\u307e\u3069\u30de\u30ae\u3001AB\u3001\u3089\u304d\u2606\u3059\u305f\u3001\u3086\u308b\u3086\u308a\u3001\u30df\u30eb\u30ad\u30a3\u3068\u304b\u306e\u30a2\u30cb\u30e1\u3001\u6771\u65b9\u3001\u30dc\u30ab\u30ed\u597d\u304d\u3060\u3088\u2517(^\u03c9^ )\u251b\u30c7\u30c7\u30f3\uff01 \u30d5\u30a9\u30ed\u30d0\u306f\u3059\u308b\u304b\u3089\u5f85\u3063\u3068\u3044\u3066 \u53ef\u6190\u3061\u3083\u3093\u540c\u76dfNo.9 \u308c\u3044\u3080\u540c\u76dfNo.4 \u898f\u5236\u57a2\u2192#SpeedPer_2', u'friends_count': 2038, u'profile_link_color': u'0084B4', u'profile_image_url': u'http://a0.twimg.com/profile_images/2722246932/b71d269b9e1e16f59698b4f7fa23a0fe_normal.jpeg', u'following': None, u'profile_background_image_url_https': u'https://si0.twimg.com/profile_background_images/600710368/ff2z5gv4s83u313432hj.jpeg', u'profile_background_color': u'C0DEED', u'id_str': u'500471418', u'profile_background_image_url': u'http://a0.twimg.com/profile_background_images/600710368/ff2z5gv4s83u313432hj.jpeg', u'name': u'\u3055\u30fc\u3057\u3083\u3059#\u30cf\u30cb\u30ab\u30e0\u30ac\u30c1\u52e2', u'lang': u'ja', u'profile_background_tile': True, u'favourites_count': 3066, u'screen_name': u'SpeedPer', u'notifications': None, u'url': u'https://mobile.twitter.com/account', u'created_at': u'Thu Feb 23 05:10:57 +0000 2012', u'contributors_enabled': False, u'time_zone': u'Irkutsk', u'protected': False, u'default_profile': False, u'is_translator': False}, u'geo': None, u'in_reply_to_user_id_str': None, u'possibly_sensitive': False, u'created_at': u'Wed Oct 17 12:48:33 +0000 2012', u'possibly_sensitive_editable': True, u'in_reply_to_status_id_str': None, u'place': None})
cursor.execute(insert_tweet_query, (tweet['id_str'], tweet))
close_connection(cursor, connection)
However, despite setting appropriate 'UTF-8' encodings I get an exception as follows
_mysql_exceptions.ProgrammingError: (1064, 'You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near \': \'NULL\', u\'truncated\': \'0\', u\'text\': "\'RT #HMV_Anime: \\xe7\\x94\\xb0\\xe6\\x9d\\x91\\\' at line 1')
What am I doing wrong?
you could try with repr:
cursor.execute(insert_tweet_query, (tweet['id_str'], repr(tweet)))