Stream parsing tweets - json

The bottom line is to listen to a specific user and transfer his tweets with minimal delay to the telegram bot.
To implement this task, I use the Twiipi library in which, as I understand it, there are 2 most important types of authentication for me:
On behalf of the user - 900 requests in 15 minutes (i.e. 1 request in 1 second)
On behalf of the application - 300 requests in 15 minutes (i.e. 1 request in 3 seconds)
I authenticate my script on behalf of the user using the OAuth1UserHandler function (listed below).
But despite this, the delay in speed appears after 7.5 minutes of work, given that the script for interacting with twitter resources runs once every 1.5 seconds. That is one way or another Authentication of my script takes place on behalf of the application. But despite this, I just made a second bot that starts after 7 minutes of the previous one, thereby updating the time for the previous one to work. My main problem is that the performance of parsing tweet data drops (acceptable speed is a maximum of 1.5 seconds, delays sometimes last 13 seconds).
Please tell me what I'm doing wrong or how can I solve my problem better?
code of one of 2 bots
import tweepy
import datetime
import time
from notifiers import get_notifier
from re import sub
TOKEN = 'telegram token here'
USER_ID = telegram user id here
ADMIN_ID = teelgram my id here for check work bots
auth = tweepy.OAuth1UserHandler(
consumer_key="consumer key",
consumer_secret="consumer secret",
access_token="access token",
access_token_secret="access token secret",
)
api = tweepy.API(auth)
# auth.set_access_token(access_token, access_token_secret)
print("############### Tokens connected ###############")
user = 'whose username we will listen to'
username_object = api.get_user(screen_name=user)
def listening_to_the_user():
print(' We start listening to the user...')
print(' When a user posts a tweet, you will hear an audio notification...')
seconds_left = 60*10
while seconds_left >= 0:
for i in api.user_timeline(user_id=username_object.id, screen_name=user, count=1):
tweet_post = i.created_at
tweet_text = sub(r"https?://t.co[^,\s]+,?", "", i.text)
tweet_time_information = [tweet_post.day, tweet_post.month, tweet_post.year, tweet_post.hour, tweet_post.minute]
now = datetime.datetime.now()
current_time = [now.day, now.month, now.year, now.hour, now.minute]
if tweet_time_information == current_time:
telegram = get_notifier('telegram')
notification_about_tweet = f'️{user}⬇️'
notification_about_tweet_time = f'{tweet_post.day}.{tweet_post.month}.{tweet_post.year}, {tweet_post.hour}:{tweet_post.minute}.{tweet_post.second}'
notification_about_current_time = f'{now.day}.{now.month}.{now.year}, {now.hour}:{now.minute}.{now.second}'
telegram.notify(token=TOKEN, chat_id=USER_ID, message=notification_about_tweet)
telegram.notify(token=TOKEN, chat_id=USER_ID, message=tweet_text)
try:
entities = i.extended_entities
itr = entities['media']
for img_dict in range(len(itr)):
telegram.notify(token=TOKEN, chat_id=ADMIN_ID, message=(entities['media'][img_dict]['media_url_https']))
except:
entities = 0
telegram.notify(token=TOKEN, chat_id=ADMIN_ID, message=notification_about_tweet)
telegram.notify(token=TOKEN, chat_id=ADMIN_ID, message=tweet_text)
telegram.notify(token=TOKEN, chat_id=ADMIN_ID, message=notification_about_tweet_time)
telegram.notify(token=TOKEN, chat_id=ADMIN_ID, message=notification_about_current_time)
try:
entities = i.extended_entities
itr = entities['media']
for img_dict in range(len(itr)):
telegram.notify(token=TOKEN, chat_id=USER_ID,
message=(entities['media'][img_dict]['media_url_https']))
except:
entities = 0
seconds_left -= 60
time.sleep(60)
seconds_left -= 1.5
time.sleep(1.5)
listening_to_the_user()
Initially, I tried to use bearer_token for authentication, but this literally did not affect the operation of my program in any way and I simply reduced it to those tokens that are now in it.
I rummaged through the documentation in search of an answer to my question, but in search I only thought of the script calling the second bot after 7 minutes of the previous one, and so they worked in turn

Related

Difficulties with web scraping

I have just came to an article called The 500 Greatest Songs of All Time and thought "oh that's cool I bet they also made a Spotify/Apple music list that I can follow". Well...they don't.
So in a nutshell, I wonder if it's possible to 1) scrap the website to extract the songs and 2) then do some kind of bulk upload to Spotify to create the list.
Songs' titles and authors are structured like this in the website:
Website screenshot. I have already tried to scrap the web with the importxml() formula in google sheets but with no success.
I understand the scrapping part is easier than the other and, as I am new to programming, I would be happy to manage to partially achieve this goal. I am sure this task can be achieved easily on python.
I feel like explaining everything would go beyond the scope here, so I tried to comment the code well enough.
1. Scrape the songs
I used python3 and selenium, their website doesn't block that.
Be sure to adjust your chromedriver path, and the output path of the .txt file at the bottom if necessary. Once it's done and you have your .txt file you can close it.
import time
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.chrome.service import Service
s = Service(r'/Users/main/Desktop/chromedriver')
driver = webdriver.Chrome(service=s)
# just setting some vars, I used Xpath because I know that
top_500 = 'https://www.rollingstone.com/music/music-lists/best-songs-of-all-time-1224767/'
cookie_button_xpath = "// button [#id = 'onetrust-accept-btn-handler']"
div_containing_links_xpath = "// div [#id = 'pmc-gallery-list-nav-bar-render'] // child :: a"
song_names_xpath = "// article [#class = 'c-gallery-vertical-album'] / child :: h2"
links = []
songs = []
driver.get(top_500)
# accept cookies, give time to load
time.sleep(3)
cookie_btn = driver.find_element(By.XPATH, cookie_button_xpath)
cookie_btn.click()
time.sleep(1)
# extracting all the links since there are only 50 songs per page
links_to_next_pages = driver.find_elements(By.XPATH, div_containing_links_xpath)
for element in links_to_next_pages:
l = element.get_attribute('href')
links.append(l)
# extracting the songs, then going to next page and so on until we hit 500
counter = 1 # were starting with 1 here since links[0] is the current page we are already on
while True:
list = driver.find_elements(By.XPATH, song_names_xpath)
for element in list:
s = element.text
songs.append(s)
if len(songs) == 500:
break
driver.get(links[counter])
counter += 1
time.sleep(2)
# verify that there are no duplicates, if there were, something would be off
if len(songs) != len( set(songs) ):
print('you f***** up')
else:
print('seems fine')
with open('/Users/main/Desktop/output_songs.txt', 'w') as file:
file.writelines(line + '\n' for line in songs)
2. Prepare Spotify
Go to the Spotify Developer Dashboard and create an
account (use your Spotify acc).
Then create an app, call it whatever you want.
On your app click settings and whitelist http://localhost:8888/callback
On your app click "users and access" and add your Spotify account
Leave the tab open, we'll come back to it
3. Prepare Your Environment
You need Node.js so make sure that is installed on your machine
Download this from Spotifys GitHub
Unzip it, cd into the folder and run npm install
Go into the authorization_code folder and open app.js in a editor
Find var scope and append ' playlist-modify-public' to the string, this is so that your app can access you Spotify playlists, see here
Now go back to the app in your Spotify Developer Dashboard we'll need to copy the Client ID and the Client Secret into the var client_id and var client_secret respectively (in the app.js file). var redirect_uri will be
http://localhost:8888/callback - don't forget to save your changes.
4. Run the Spotify side of things
cd into the authorization_code folder and run app.js with node app.js (this is basically a server running on your PC)
Now if that works leave it running and go to http://localhost:8888, authorise your Spotify account there
There copy the full token, including the overflow, use inspect element to get it
Adjust the user_id and auth variables as well as the path to the output_songs.txt (at with open) in the following python script and run that, songs which are not found will be printed out at the end, give it a search with Google. They are usually on Spotify as well but Google seem to have the better search algorithm (surprised Pikachu face).
import requests
import re
import json
# this is NOT you display name, it's your user name!!
user_id = 'YOUR_USERNAME'
# paste your auth token from spotify; it can time out then you have to get a new one, so dont panic if you get a bunch of responses in the 400s after some time
auth = {"Authorization": "Bearer YOUR_AUTH_KEY_FROM_LOCALHOST"}
playlist = []
err_log = []
base_url = 'https://api.spotify.com/v1'
search_method = '/search'
with open('/Users/main/Desktop/output_songs.txt', 'r') as file:
songs = file.readlines()
# this querys spotify does some magic and then appends the tracks spotify uri to an array
def query_song_uris():
for n, entry in enumerate(songs):
x = re.findall(r"'([^']*)'", entry)
title_len = len(entry) - len(x[0]) - 4
title = x[0]
artist = entry[:title_len]
payload = {
'q': (entry),
'track:': (title),
'artist:': (artist),
'type': 'track',
'limit': 1
}
url = base_url + search_method
try:
r = requests.get(url, params=payload, headers=auth)
print('\nquerying spotify; ', r)
c = r.content.decode('UTF-8')
dic = json.loads(c)
track_uri = dic["tracks"]["items"][0]["uri"]
playlist.append(track_uri)
print(track_uri)
except:
err = f'\nNr. {(len(songs)-n)}: ' + f'{entry}'
err_log.append(err)
playlist.reverse()
query_song_uris()
# creates a playlist and returns playlist id
def create_playlist():
payload = {
"name": "Rolling Stone: Top 500 (All Time)",
"description": "music for old men xD with occasional hip hop appearences. just kidding"
}
url = base_url + f'/users/{user_id}/playlists'
r = requests.post(url, headers=auth, json=payload)
c = r.content.decode('UTF-8')
dic = json.loads(c)
print(f'\n\ncreating playlist #{dic["id"]}; ', r)
return dic["id"]
def add_to_playlist():
playlist_id = create_playlist()
while True:
if len(playlist) > 100:
p = playlist[:100]
else:
p = playlist
payload = {"uris": (p)}
url = base_url + f'/playlists/{playlist_id}/tracks'
r = requests.post(url, headers=auth, json=payload)
print(f'\nadding {len(p)} songs to playlist; ', r)
del playlist[ : len(p) ]
if len(playlist) == 0:
break
add_to_playlist()
print('\n\ncheck your spotify :)')
print("\n\n\nthese tracks didn't make it, check manually:\n")
for line in err_log:
print(line)
print('\n\n')
Done
If you don't want to run the code yourself, heres the playlist:
https://open.spotify.com/playlist/5fdLKYNFlA4XSvhEl36KXS
If you have trouble, everything from step 2 on is also described here in the Web API quick start or in general in the web API docs.
Regarding Apple Music
So Apple seems very closed up (surprise haha). What I found though is that you can query the i-Tunes store. Given response also contains a direct link to the song(s) on Apple music.
You might be able to go from there.
Get ISRC code from iTunes Search API (Apple music)
PS: undeniably regex is witchcraft, but y'all here got my back

How i can increase speed of this loop python?

I'm new to python programming and I'm trying to analyze the pending blocks of the BSC network. My program checks for pending blocks (event) and does something. The point is that a lot of events are happening while the loop event is active and my process is so slow to keep all new data being analised on real time.
If i remove the function hash_analise() and print all events is ok, the program is receiving data faster and i can print all hashs in realtime but when i call this function my program became slower.
I tried with threading but i need to syncronize all data from events and wait with thread.join() but when i wait this thread make slower than before.
Is any way to run this faster?
Thanks for help, code without thread:
def hash_analise(hash):
try:
hash_analise = web3.eth.get_transaction(hash)
print_hash = Web3.toJSON(hash_analise)
print("IMPRIME HASH1:", print_hash)
if TOKEN_LOWER_CORRIGIDO in print_hash:
print("\nCONTÉM A STRING ESCOLHIDA")
except:
print("TRANSAÇÃO NÃO LOCALIZADA")
if __name__ == "__main__":
tx_filter = web3.eth.filter('pending')
count = 0
while True:
for event in tx_filter.get_new_entries():
evento = Web3.toJSON(event)
txnhash = evento[1:67]
hash_analise(txnhash)
count += 1
print("Main", count)

Writing a circuit in ZoKrates to proof age is over 21 years

I am trying to see if I can use ZoKrates in a scenario where a user can prove to the verifier that age is over 21 years without revealing the date of birth. I think its a good use case for zero-knowledge proof but like to understand the best way to implement it.
The circuit code (sample) takes the name of the user as public input(name attestation is done by a trusted authority like DMV and is a most likely combination of offline/online mechanism), then the date of birth which is a private input.
//8297122105 = "Razi" is decimal.
def main(pubName,private yearOfBirth, private centuryOfBirth):
x = 0
y = 0
z = 0
x = if centuryOfBirth == 19 then 1 else 0 fi
y = if yearOfBirth < 98 then 1 else 0 fi
z = if pubName == 8297122105 then 1 else 0 fi
total = x + y + z
result = if total == 3 then 1 else 0 fi
return result
Now, using ./target/release/zokrates generate-proof command get the output that can be used as an input toverifier.sol.
A = Pairing.G1Point(0x24cdd31f8e07e854e859aa92c6e7f761bab31b4a871054a82dc01c143bc424d, 0x1eaed5314007d283486826e9e6b369b0f1218d7930cced0dd0e735d3702877ac);
A_p = Pairing.G1Point(0x1d5c046b83c204766f7d7343c76aa882309e6663b0563e43b622d0509ac8e96e, 0x180834d1ec2cd88613384076e953cfd88448920eb9a965ba9ca2a5ec90713dbc);
B = Pairing.G2Point([0x1b51d6b5c411ec0306580277720a9c02aafc9197edbceea5de1079283f6b09dc, 0x294757db1d0614aae0e857df2af60a252aa7b2c6f50b1d0a651c28c4da4a618e], [0x218241f97a8ff1f6f90698ad0a4d11d68956a19410e7d64d4ff8362aa6506bd4, 0x2ddd84d44c16d893800ab5cc05a8d636b84cf9d59499023c6002316851ea5bae]);
B_p = Pairing.G1Point(0x7647a9bf2b6b2fe40f6f0c0670cdb82dc0f42ab6b94fd8a89cf71f6220ce34a, 0x15c5e69bafe69b4a4b50be9adb2d72d23d1aa747d81f4f7835479f79e25dc31c);
C = Pairing.G1Point(0x2dc212a0e81658a83137a1c73ac56d94cb003d05fd63ae8fc4c63c4a369f411c, 0x26dca803604ccc9e24a1af3f9525575e4cc7fbbc3af1697acfc82b534f695a58);
C_p = Pairing.G1Point(0x7eb9c5a93b528559c9b98b1a91724462d07ca5fadbef4a48a36b56affa6489e, 0x1c4e24d15c3e2152284a2042e06cbbff91d3abc71ad82a38b8f3324e7e31f00);
H = Pairing.G1Point(0x1dbeb10800f01c2ad849b3eeb4ee3a69113bc8988130827f1f5c7cf5316960c5, 0xc935d173d13a253478b0a5d7b5e232abc787a4a66a72439cd80c2041c7d18e8);
K = Pairing.G1Point(0x28a0c6fff79ce221fccd5b9a5be9af7d82398efa779692297de974513d2b6ed1, 0x15b807eedf551b366a5a63aad5ab6f2ec47b2e26c4210fe67687f26dbcc7434d);
Question
Consider a scenario when a user (say Razi) can take the proof above (probably in a form of a QR code) and scan it on a machine (confirms age is over 21) that will run the verifierTx method on the contract. Since the proof explicitly has "Razi" inside the proof and contract can verify the age without knowing the actual date of birth we get a better privacy. However, the challenge is now anyone else can reuse the proof since it was used within the transaction. One way to mitigate this issue is to make sure that either the proof is valid for a limited time or (just may good for one-time use). Another way is to ensure proof of user's identity ("Razi"), in a way that is satisfied beyond doubt (e.g. by confirming identity on blockchain etc.).
Are there ways to make sure proof can be used by a user more than once?
I hope the question and explanation make sense. Happy to elaborate more on this, so let me know.
What you will need is:
Razi owning an ethereum public/private key
a (salted) fingerprint fact (e.g. birthday as unix timestamp) associated with Razi's public ethereum address and endorsed on-chain by an authority
Now you can write a ZoKrates program like this
def main(private field salt, private field birthdayAsUnixTs, field pubFactHashA, field pubFactHashB, field ts) -> (field)
// check that the fact is corresponding to the endorsed salted fact fingerprint onchain
h0, h1 = sha256packed(0,0,salt,birthdayAsUnixTs)
h0 == pubFactHashA
h1 == pubFactHashB
// 18 years is pseudo code only!
field ok = if birthdayAsUnixTs * 18 years <= ts then 1 else 0 fi
return ok
Now in your contract you can
check that msg.sender is the owner of the endorsed fact
require(ts <= now)
call verifier with the proof and public input: (factHash, ts, 1)
You can do that by hashing the proof and adding that hash in a list of "used proofs", so no one can use it again.
Now, ZoKrates add randomness in the generation of the proof in order to prevent revealing that the same witnesss has been used, since zkproofs do not show anything about the witness. So, if you want to prevent the person to use his credential (accredit that he is over 21 years old ) more than once you have to use a nullifier (See ZCash approach in the "How zk-SNARKs are applied to create a shielded transaction" part).
Basically you use a string with the data of Razi nullifier_string = centuryOfBirth+yearOfBirth+pubName and then you publish it Hash nullifier = H(nullifier_string) in a table of revealed nullifiers. In the ZoKrates scheme you have to add the nullifier as a public input and then verify that the nullifier corresponds to the data provided. Something like this:
import "utils/pack/unpack128.code" as unpack
import "hashes/sha256/256bitPadded.code" as hash
import "utils/pack/nonStrictUnpack256.code" as unpack256
def main(pubName,private yearOfBirth, private centuryOfBirth, [2]field nullifier):
field x = if centuryOfBirth == 19 then 1 else 0 fi
field y = if yearOfBirth < 98 then 1 else 0 fi
field z = if pubName == 8297122105 then 1 else 0 fi
total = x + y + z
result = if total == 3 then 1 else 0 fi
null0 = unpack(nullifier[0])
null1 = unpack(nullifier[1])
nullbits = [...null0,...null1]
nullString = centuryOfBirth+yearOfBirth+pubName
unpackNullString = unpack256(nullString)
nullbits == hash(unpackNullString)
return result
This has to be made in order to prevent that Razi provide a random nullifier unrelated to his data.
Once you had done this, you can check if the nullifier provided has been already used if it is registered in the revealed nullifier table.
The problem with this in your case is that the year of birth is a weak number to hash. Someone can do a brute-force attack to the nullifier and reveal the year of birth of Razi. You have to add a strong number in the verification (Razi secret ID? a digital signature?) to prevent this attack.
Note1: I have an old version of ZoKrates, so check the import path right.
Note2: Check the ZoKrates Hash function implementation, you may have problem with the padding of the inputs, the unpack256 function prevent this I suppose, but you can double check this to prevent bugs.

How to obtain a list of titles of all Wikipedia articles

I'd like to obtain a list of all the titles of all Wikipedia articles. I know there are two possible ways to get content from a Wikimedia powered wiki. One would be the API and the other one would be a database dump.
I'd prefer not to download the wiki dump. First, it's huge, and second, I'm not really experienced with querying databases. The problem with the API on the other hand is that I couldn't figure out a way to only retrieve a list of the article titles and even if it would need > 4 mio requests which would probably get me blocked from any further requests anyway.
So my question is
Is there a way to obtain only the titles of Wikipedia articles via the API?
Is there a way to combine multiple request/queries into one? Or do I actually have to download a Wikipedia dump?
The allpages API module allows you to do just that. Its limit (when you set aplimit=max) is 500, so to query all 4.5M articles, you would need about 9000 requests.
But a dump is a better choice, because there are many different dumps, including all-titles-in-ns0 which, as its name suggests, contains exactly what you want (59 MB of gzipped text).
Right now, as per the current statistics the number of articles is around 5.8M.
To get the list of pages I did use the AllPages API. However, the number of pages I get is around 14.5M which is ~3 times of what I was expecting. I restricted myself to namespace 0 to get the list. Following is the sample code that I am using:
# get the list of all wikipedia pages (articles) -- English
import sys
from simplemediawiki import MediaWiki
listOfPagesFile = open("wikiListOfArticles_nonredirects.txt", "w")
wiki = MediaWiki('https://en.wikipedia.org/w/api.php')
continueParam = ''
requestObj = {}
requestObj['action'] = 'query'
requestObj['list'] = 'allpages'
requestObj['aplimit'] = 'max'
requestObj['apnamespace'] = '0'
pagelist = wiki.call(requestObj)
pagesInQuery = pagelist['query']['allpages']
for eachPage in pagesInQuery:
pageId = eachPage['pageid']
title = eachPage['title'].encode('utf-8')
writestr = str(pageId) + "; " + title + "\n"
listOfPagesFile.write(writestr)
numQueries = 1
while len(pagelist['query']['allpages']) > 0:
requestObj['apcontinue'] = pagelist["continue"]["apcontinue"]
pagelist = wiki.call(requestObj)
pagesInQuery = pagelist['query']['allpages']
for eachPage in pagesInQuery:
pageId = eachPage['pageid']
title = eachPage['title'].encode('utf-8')
writestr = str(pageId) + "; " + title + "\n"
listOfPagesFile.write(writestr)
# print writestr
numQueries += 1
if numQueries % 100 == 0:
print "Done with queries -- ", numQueries
print numQueries
listOfPagesFile.close()
The number of queries fired is around 28900, which results in approx. 14.5M names of the pages.
I also tried the all-titles link mentioned in the above answer. In that case as well I am getting around 14.5M pages.
I thought that this overestimate to the actual number of pages is because of the redirects, and did add the 'nonredirects' option to the request object:
requestObj['apfilterredir'] = 'nonredirects'
After doing that I get only 112340 number of pages. Which is too small as compared to 5.8M.
With the above code I was expecting roughly 5.8M pages, but that doesn't seem to be the case.
Is there any other option that I should be trying to get the actual (~5.8M) set of page names?
Here is an asynchronous program that will generate mediawiki pages titles:
async def wikimedia_titles(http, wiki="https://en.wikipedia.org/"):
log.debug('Started generating asynchronously wiki titles at {}', wiki)
# XXX: https://www.mediawiki.org/wiki/API:Allpages#Python
url = "{}/w/api.php".format(wiki)
params = {
"action": "query",
"format": "json",
"list": "allpages",
"apfilterredir": "nonredirects",
"apfrom": "",
}
while True:
content = await get(http, url, params=params)
if content is None:
continue
content = json.loads(content)
for page in content["query"]["allpages"]:
yield page["title"]
try:
apcontinue = content['continue']['apcontinue']
except KeyError:
return
else:
params["apfrom"] = apcontinue

Too many sql connections error : due to long polling

I have designed a coding platform just like Spoj and Codeforces for competitions to be organised in my college on LAN.
I have used long polling there so that any announcements from the Admin can be broadcasted to all users with a JavaScript alert message. When anything is posted on the forum then the admin also gets a notification.
But for just 16 users (including the 1 Admin) accessing the site, the server went down showing too many sql connections. I restarted my laptop (server) and it continued for a while, then again went down; giving the same error message as before.
When I removed both long-poll processes everything continued smoothly.
Server-side code for long-poll:
include 'dbconnect.php';
$old_ann_id = $_GET['old_ann_id'];
$resultann = mysqli_query($con,"SELECT cmntid FROM announcements ORDER BY cmntid DESC LIMIT 1");
while($rowann = mysqli_fetch_array($resultann)){
$last_ann_id = $rowann['cmntid'];
}
while($last_ann_id <= $old_ann_id){
usleep(10000000);
clearstatcache();
$resultann = mysqli_query($con,"SELECT cmntid FROM announcements ORDER BY cmntid DESC LIMIT 1");
while($rowann = mysqli_fetch_array($resultann)){
$last_ann_id = $rowann['cmntid'];
}
}
$response = array();
$response['msg'] = 'new';
$response['old_ann_id'] = $last_ann_id;
$resultann = mysqli_query($con, "Select announcements from announcements where cmntid = $last_ann_id");
while($rowann = mysqli_fetch_array($resultann)){
$response['announcement'] = $rowann['announcements'];
}
echo json_encode($response);
Max connections is defined. Think the default is 100 or 151 connections depending on the version of MySQL. You can see the value in "Server variables and settings" in phpmyadmin (or directly by executing *show variables like "max_connections";* ).
If that is set to something very low (say 10) and you have (say) 15 users you will hit the limit rapidly. You are giving each long polling script its own connection, and that connection is probably sitting open until that long polling script ends. You could likely reduce this by having the script disconnect after each time it checks the database, then reconnect the next time it checks (ie, if your long polling script checks the db every 5 seconds you probably have well over 4.5 seconds of that 5 seconds currently where there is a connection to the db but where the connection is not being used)
However you could have a larger number of connections, but if you trigger the ajax polling multiple times per user, each could have several simultaneous connections. This is probably quite easy to do with a minor bug in your javascript.
Possibly worse if you are using a persistent connections you might leave connections open after the user has left the page that calls the long polling script.
EDIT - update based on your script.
Note I am not sure exactly what your dbconnect.php include is doing. I might be possible to easily call a connect / disconnect function in that include, but I have just put it in this example code as using the mysqlu_close and mysqli_connect functions.
<?php
include 'dbconnect.php';
$old_ann_id = $_GET['old_ann_id'];
$resultann = mysqli_query($con,"SELECT MAX(cmntid) FROM announcements");
if($rowann = mysqli_fetch_array($resultann))
{
$last_ann_id = $rowann['cmntid'];
}
$timeout = 0;
while($last_ann_id <=$old_ann_id and $timeout < 6)
{
$timeout++;
mysqli_close($con);
usleep(10000000);
clearstatcache();
$con = mysqli_connect("myhost","myuser","mypassw","mybd");
$resultann = mysqli_query($con,"SELECT MAX(cmntid) FROM announcements");
if($rowann = mysqli_fetch_array($resultann))
{
$last_ann_id = $rowann['cmntid'];
}
}
if ($last_ann_id >$old_ann_id)
{
$response = array();
$response['msg'] = 'new';
$response['old_ann_id'] = $last_ann_id;
$resultann=mysqli_query($con,"SELECT cmntid, announcements FROM announcements WHERE cmntid>$old_ann_id ORDER BY cmntid");
while($rowann = mysqli_fetch_array($resultann))
{
$response['announcement'][]=$rowann['announcements'];
$response['old_ann_id'] = $rowann['cmntid'];
}
mysqli_close($con);
echo json_encode($response);
}
else
{
echo "No annoucements - resubmit";
}
?>
I have added a count to the main loop. But it will drop out of the loop whether anything is found once it has executed 6 times. This way even if someone leaves the page the script will only be running for a short time afterwards (max a minute). You will have to amend you javascript to catch this and resubmit the ajax call.
Also I have changed the announcement in the response to be an array. This way if there are several announcements while the script is running all will be brought back.