Gmail API advice

So after much wonderful advice from @patrick, I have managed to get this task further than i imagined, and right now it is already pretty decent, however I am having some strange errors that I am hoping I may find a solution to.

The system I am building needs to query 10 emails all the way up to a million emails if the user desires, i know that in itself sounds like a terrible idea, however for what I am doing it does need it, once it’s done, i think everyone will kind of understand why.

So the google API allows me to get a LIST of all messages up to a limit of 500 in a single query and all it returns as data is something like {“id” : “1742d1dea0ef”} which is like some random ASCII string or something, as well as a single pageToken, which is also like some strange ASCII string, that is non sequential, and by adding that as a parameter, it gets the next 500 message IDs, but at least I can query more than 1 at a time and its pretty fast to just get that.

To get the additional message data, I need to call a different google API, which takes a query parameter of message ID, and as much as I have looked, it is 1 at a time, even though I am not getting the full data back and only a smaller portion of the message headers.

The first time i tried i passed the 500 message IDs to a repeat, which then ran the message detail query 500 times, the issue here is that it hits an error pretty often and just stops, not even a real error, Google just returns something like, the command had an error, they even have the cheek to say “And thats all we know” try again in 30 seconds.
I tried 50 at a time and that seems to run pretty well (most of the time), and I know it is not quota limit, because I am about a billion short of hitting that.

When i tried to add paging into that solution it got a little messy and really wasnt working well. So I tried one at a time and ran the repeat with the message list step, with 1 message ID returned at a time and then passed to the second api for message detail, each iteration takes the pageToken and moves to the next page. This was working alright with initial tests up to about 4000 messages, which is when i stopped it running so I could do something with the returned data.

I want to run a database insert on each iteration on my own database, and I am only inserting and fetching about 5 columns of data per message, so although it could be 30k rows its only header data, no attachments, no message body etc. each column is under 150 chars.

It works, but is unreliable, i get 600 messages sometimes, other times 2700 messages and then it errors with a 504 gateway timeout, its just so random though, after reading up on the error, I think it is something to do with 2 servers, google reading the API call, then another inserting a row at a time of data, so it becomes a little unstable. Has anyone got any other ideas of how i could achieve this.

End result i need all email results from a Google account, only 5 pieces of header data all stored in my database, dont know if using server side for the entire thing, I can wait for 10 api calls, put it in some temp storage and then write 10 rows to the database at a time, to try reduce the reliance on 2 servers or something.
Google says they are good for API calls up to 100 per second per user, and I am only at 1.7 per second, so I assume its not their side but my server.

Anyway, any bright ideas are welcomed, sorry for the long story.

Community Page
Last updated: