Announcement

Collapse
No announcement yet.

Automating Template Data Feeds for Google Base using cron

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

    #16
    Re: Automating Template Data Feeds for Google Base using cron

    Can this technique be used for other admin modules (not template data feed by emporium plus) that generate external files like reports or lists that are exported?

    Comment


      #17
      Re: Automating Template Data Feeds for Google Base using cron

      Hi all. We have a large amount of products (107,000 plus) in our store. We setup our template data feed using the following notes:
      http://www.emporiumplus.com/v5/tdfcron.txt


      When we try to test it in the browser, it runs for a bit and then gives us the following error:




      The page isn't redirecting properly

      Firefox has detected that the server is redirecting the request for this address in a way that will never complete.




      We submitted a ticket to Bill, and he said his host had told him that it has to do with a server setting that defines how many redirects you can have.

      I would really love if Bill's host could chime in and let us know any further details on this if at all possible. I really do not want to have to manually run this feed every day.

      Thanks!
      Eldon

      Comment


        #18
        Re: Automating Template Data Feeds for Google Base using cron

        With that number of products, my guess is you're having a timeout of the http connection before the export completes. You'll probably need to have your host increase the web server connection timeout as well as the Miva Empresa engine timeout (aka globaltimeout).
        David Hubbard
        CIO
        Miva
        [email protected]
        http://www.miva.com

        Comment


          #19
          Re: Automating Template Data Feeds for Google Base using cron

          The method he is using reloads the page with the 301 redirects. Would he still encounter the timeout issue?
          Bill Weiland - Emporium Plus http://www.emporiumplus.com/store.mvc
          Online Documentation http://www.emporiumplus.com/tk3/v3/doc.htm
          Question http://www.emporiumplus.com/mivamodu...vc?Screen=SPTS
          Facebook http://www.facebook.com/EmporiumPlus
          Twitter http://twitter.com/emporiumplus

          Comment


            #20
            Re: Automating Template Data Feeds for Google Base using cron

            I don't really have a way to debug it or know what is occurring with the information given so far. Although, if that's what is being done, I would assume there is some number of redirects a browser will take before it no longer keeps following them.
            David Hubbard
            CIO
            Miva
            [email protected]
            http://www.miva.com

            Comment


              #21
              Re: Automating Template Data Feeds for Google Base using cron

              It sounds like malformed rewrites stomping on each other.

              one thing that might help. since the rewrites apply in a subdirectory first, before rewrites in a higher level folder kick in (they cascade up the folder structure)...

              you could try moving the rewrite that applies to this single file into the subfolder which it appears to be requested from.
              AND use the rewrite flag at the end of the rule which will cause the request to drop out of the htaccess rewrites entirely. Then none of the other rewrite junk in your higher level htaccess files will apply.

              I think it's a [END] or [L] or something like that ( I think L and END are a little different)
              here is a link with more info http://httpd.apache.org/docs/current/rewrite/flags.html

              Maybe that will help.

              I had a similar issue once.. I had to go through all my rules step by step and figure out which ones were whacking on each other and causing the request to get stuck in an endless loop. It's NOT a fun process.

              I have to add, i'm not an expert in this. but it cant' hurt to try
              Last edited by kayakbabe; 09-25-13, 02:47 PM.

              Comment


                #22
                Re: Automating Template Data Feeds for Google Base using cron

                Thank you for the replies everyone. Our main store has a ton of rewrites, so we are testing this with a new store that we haven't launched yet and has no .htaccess rewrite rules. Do we need to add some specific rewrite rules for the Template Data Feed cron to work? Btw, it is erroring out after the first redirect. I tried playing with the number of products before refresh setting in the TDF but that did not help. Also, the globaltimeout is currently 300. I can always try increasing that if you think it will help.

                Please let me know what further info you would need to try to diagnose this.

                Thanks,
                Eldon

                Comment


                  #23
                  Re: Automating Template Data Feeds for Google Base using cron

                  The globaltimeout is set to 600 seconds, which is already quite long. Setting it to 0 (disable) does not make any difference. The process starts running and terminates within 1-2 seconds when run from the CLI, so I don't think it's either Apache or MivaVM issue - both can handle processes running much longer than that. Bill suggested his host changed "the number of redirects allowed" but did not elaborate or have more info as to where/what setting that would be. The site in question does have over 2200 custom rewrite rules, on top of the 107,000+ products, so perhaps that's related - the browser just times out with too many redirects. But even then, shouldn't running the command from CLI allow it to run longer than 1-2 seconds, with no errors? The script simply terminates with "Done" message after about 1-2 seconds from initiating it. Nothing in the logs, no errors.

                  I'm just wondering whether Bill had a chance to test this module in a large store scenario, north of 100,000 products, to ensure proper looping through all products - using just DB lookups to create the export file, rather than "browsing" the store using GET or CURL to simulate a human visitor. Seems a little unneeded to do that, rather than create a proper export file straight from the DB records, and bypass any .htaccess redirects. Or am I missing something in the logic behind this module that it must look like a human visitor on the site to generate the export file?

                  I've also noticed in the docs there's a mention of "--max-redirs 1000" - since this site has over 2200 of them, should the client perhaps try bumping this up to say 2500 in the cron job? Does this variable have to do with the 301 redirects that were previously mentioned?

                  Comment


                    #24
                    Re: Automating Template Data Feeds for Google Base using cron

                    2200 custom rewrite rules? That is simply ludicrous. And considering more? Absolutely insane!

                    A log analysis needs to be done to see if any of those rewrites can be dropped or moved out of the web root htaccess.
                    If they are doing redirects of old products that don't exist anymore and only getting one or two hits on them. Drop the rules.

                    Seriously that is just nuts.

                    There ought to be some kind of pattern to use for regex to maybe 4 or five rules could be used instead of 2200.

                    Every single request for any item (not just a web page, ), for every single little fragment that makes up a web page is getting pushed through 2200 rewrite.

                    Example, to show a customer a web page, there is the html
                    code itself requested be the browser(2200 rules processes for that). Inside that there probably calls to a CSS file (now you've done 4400 rules), and maybe one image (6600 rules), and let's say 1 external .js file (8800) rules.
                    That is not a typical web page though. A typical we page might have a dozen images (the logo, navigation images, maybe a promotion or two, some thumbnails, usually one CSS but maybe 2 or 3, probably a couple javascript files, and WHAM suddenly to show one single web page to an enduser your one little we page has caused 44000! rules to be parsed through. For one page!

                    Compund that by simultaneous visitors and your server is going to have problems, your page loading is going to affected.
                    If you have no traffic you won't know, this isn't going to be very detectable at low traffic sites, site owners get 'used to the way their site behaves'. But if traffic goes up, your visitors will be affected by it, page loading will slow down, and they will just leave. you'll be frustrated with why people aren't sticking around. It's because the more they browse and the more other people broae at the same time, the worse their experience will get.

                    If some relate to an old folder that doesn't exist anymore, make that folder and move those rules into that folder. Then all the requests do not relate to that folder won't have to be affected by them. This I really good to do of you changed your store folder from say Merchant2 to mm5. Don't put your rewrite in the root folder of the website that change Merchant2 to mm5, keep an empty merchant2 folder except for an htaccess file that contains the 301 rewrites from Merchant2 to mm5.
                    that way once the search engines get caught up, they'll not even visit that old Merchant2 folder. And any new traffic won't have to go though that rewrite rule gauntlet. Old inbound links will still work.

                    Use regular expressions to consolidate those rules into just a few.

                    If you have, say five rules that you really really need because they are still getting lots of traffic in your logs from desirable search engines and visitors, keep them. But for other junk that accounts for less that say 2% of your page requests (pages not all items) let the 404 do it's job.

                    Comment


                      #25
                      Re: Automating Template Data Feeds for Google Base using cron

                      I know that, you know that... I keep telling clients anything more than a dozen global redirects is a Very Bad Idea, but, I can't force them not to do it. The store does suffer from high server load, with even low to moderate traffic levels, so yes, I completely agree, there's a major rewrite overload and based on the number of images and items on an average page, there's probably closer to a million rules to be parsed for every page load.

                      Hence why I don't think even the --max-redirs flag will do much, if the module in question builds the export file based on "loading" each page like a regular visitor would. I think the solution would be to rewrite the module to build pages based on mySQL DB data, not based on what you "see" in the store as a visitor by browsing each page, one at a time, and process all the redirects in the process.

                      Bill - am I correctly understanding your export module traverses the store itself, one page at a time, to build this export file? Can the logic be changed to build the export file directly from mySQL DB, and bypass the store front-end altogether?

                      Comment


                        #26
                        Re: Automating Template Data Feeds for Google Base using cron

                        That is not what the module does. Normally it runs from admin and exports from the DB. You can run a regular cron of that if you don't have too many product, ie can finish within the timeout. But if you can't it has to refresh using javascript. The cron with wget cannot handle the javascript. So the page build is an alternate way of running the cron and displaying the page like a browser would and exporting from that. But to do that you have to be able to redirect the page 301 multiple times. Simple solution, run the export from the admin like it was originally designed. Some stores just won't be able to use cron.
                        Bill Weiland - Emporium Plus http://www.emporiumplus.com/store.mvc
                        Online Documentation http://www.emporiumplus.com/tk3/v3/doc.htm
                        Question http://www.emporiumplus.com/mivamodu...vc?Screen=SPTS
                        Facebook http://www.facebook.com/EmporiumPlus
                        Twitter http://twitter.com/emporiumplus

                        Comment


                          #27
                          Re: Automating Template Data Feeds for Google Base using cron

                          Bill - can this module be re-written so that it is a stand-alone MVC or even PHP script that can loop through the database until it reaches EOF, so that you can run it like a normal cron job and not as a pseudo-cron that initiates the script "through" admin.mvc or merchant.mvc? This would avoid the whole javascript issue, wget, etc. I'd personally prefer this to be a PHP script, it should then run much faster than calling a miva script, and be much more universal and server load friendly, especially in large store scenarios like this one.

                          Comment


                            #28
                            Re: Automating Template Data Feeds for Google Base using cron

                            It is a miva module in over 2000 stores. It is normally run in admin with no timeout issues. Stores of less than 5,000 products may be running cron with wget. You couldn't use a standalone mvc because you would have the same timeout issues. If someone wants to write a php script, perhaps they could sell 2000 copies before miva makes it obsolete and does it in-house.
                            Bill Weiland - Emporium Plus http://www.emporiumplus.com/store.mvc
                            Online Documentation http://www.emporiumplus.com/tk3/v3/doc.htm
                            Question http://www.emporiumplus.com/mivamodu...vc?Screen=SPTS
                            Facebook http://www.facebook.com/EmporiumPlus
                            Twitter http://twitter.com/emporiumplus

                            Comment


                              #29
                              Re: Automating Template Data Feeds for Google Base using cron

                              I never said that. It works fine for stores with a million products. Size is not an issue as long as you use the module as intended, ie to be run by logging into admin like you do for most other miva administration functions. It reloads the page using javascript until it reaches the end of the DB. But if you want to use cron to trigger an admin event, it cannot use the javascript reloading. So simply run the module in admin as intended for any size store.
                              Bill Weiland - Emporium Plus http://www.emporiumplus.com/store.mvc
                              Online Documentation http://www.emporiumplus.com/tk3/v3/doc.htm
                              Question http://www.emporiumplus.com/mivamodu...vc?Screen=SPTS
                              Facebook http://www.facebook.com/EmporiumPlus
                              Twitter http://twitter.com/emporiumplus

                              Comment


                                #30
                                Re: Automating Template Data Feeds for Google Base using cron

                                Originally posted by eldon99 View Post
                                Thank you for the replies everyone. Our main store has a ton of rewrites, so we are testing this with a new store that we haven't launched yet and has no .htaccess rewrite rules. Do we need to add some specific rewrite rules for the Template Data Feed cron to work? Btw, it is erroring out after the first redirect. I tried playing with the number of products before refresh setting in the TDF but that did not help. Also, the globaltimeout is currently 300. I can always try increasing that if you think it will help.

                                Please let me know what further info you would need to try to diagnose this.

                                Thanks,
                                Eldon
                                Since this error is occurring on the first redirect, there is likely something in the .htaccess or server config related to redirects, rewrites or 404 pages that is causing an 'internal' apache redirect to occur. Apache supports redirects and rewrites that affect what is sent to the browser, but can also do internal proxying of requests which will also honor redirects if they're given. For example, you could have an incoming request for /file.html externally redirected to /newfile.html which means the browser is told hey, you should request /newfile.html instead, then the browser makes another request. It's also possible to have rewrite rules that cause apache to rewrite the incoming request to /newfile.html on behalf of the requesting browser and then it behaves as if that is what had been requested to begin with, so the contents served back are of the newfile.html. Finally, there could be an unrelated rewrite/redirect on requests for /newfile.html to /thirdfile.html. So what would occur in that case if an internal redirect were being used is the original request comes in, apache tries to make the request for the second file instead, then is redirected to the third file and it makes another request and serves the content.

                                The reason why I'm explaining all of that is to show that it's certainly possible to create a redirect loop all internal to the server so the very first external (web browser or cron job) request produces a maximum redirects reached message while only a single redirect is logged in the access log and the 'browser' never even sees its first external 301 redirect.

                                If this is occurring, then there has to be something in conflict with the request that's being made from the cron job (or browser). Kind of a checklist on what to look at would be:

                                1) Make sure there are no disagreements between SSL and non-SSL URL's in what is being requested, what is configured in the copy of Miva Merchant and what is set up in the web server config. For example, if the site in question has no SSL but the store is configured with https URL's, the store itself is going to try to redirect to a secure link no matter what the request if it involves admin.mvc. Sometimes this can result in the request hitting the wrong website if there's a different site that is default for the IP address. Or if the URL being requested is different than what Merchant is configured for, Merchant could be issuing a redirect to a new URL that conflicts.

                                2) If all that checks out, I'd turn on Apache rewrite debugging. This is done in the server config via RewriteLog with a high RewriteLogLevel: http://httpd.apache.org/docs/2.2/mod...tml#rewritelog

                                That will log every rewrite and redirect that occurs as a result, and it will be easy to see what is being issued and then it can be determined why it is being issued.
                                David Hubbard
                                CIO
                                Miva
                                [email protected]
                                http://www.miva.com

                                Comment

                                Working...
                                X