TYPO3 – peschuster

Using tx_crawler with tx_simulatebe

July 5, 2011 by peter

Today I encountered a strange behavior of the extension crawler in a TYPO3 installation:

Due to a bug in the extension, tx_crawler got stuck in an infinite redirect loop while trying to call a site as a logged in frontend user. This lead to a “maximum execution depth” php/suhosin error:

ALERT - maximum execution depth reached - script terminated
(attacker 'REMOTE_ADDR not set', file
'xxxxxxxxxxx/typo3conf/ext/crawler/class.tx_crawler_lib.php', line 1228)

It turned out that the TYPO3 site always returned a “HTTP 302 Found” response. But the specified redirect url (in the “Location” header) was the same as the request url.

I emulated the tx_crawler request with Fiddler, but couldn’t reproduce the behavior.

Finally I added some debug calls to the class tx_crawler_lib and noticed a strange cookie in the webserver response headers:

simulatebe=deleted

This cookie seems to be set through the extension simulatebe, which is also used on the site. I looked at the source code of tx_simulatebe_pi1 and indeed, simulatebe does a redirect to t3lib_div::getIndpEnv(“TYPO3_REQUEST_URL”), which returns just the url of the the current request.

In my case simulating a backend user does not really make sense, when calling the site through tx_crawler. To prevent simualtebe from returning a “HTTP 302 Found” response on tx_crawler requests we have to deactivate the plugin just for these requests.

The crawler extension sets “TYPO3 crawler” as user agent header. Therefore an easy solution for conditional inclusion of tx_simulatebe_pi1 is to check the user agent header data against the string “TYPO3 crawler”.

With the following TypoScript code in the site template tx_simulatebe is included only when the request does not come from tx_crawler:

page.headerData.20 < plugin.tx_simulatebe_pi1
[useragent = TYPO3 crawler]
  page.headerData.20 >
[global]