Quick and dirty loop through multibyte string Each level of the recursion requires time linear in the length of the string, and there is logarithmic number of levels, so the total runtime is O(N log N), which is still more than theoretically optimal O(N), but sadly this is the best idea I've got. Self :: letters ( self :: substr ( $a, $len > 1 ))Īs you can see, the Strings::letters($text) split the text recursively into two parts. One way to work around it is to first split your text into an array of letters using some smart preprocessing, and only then iterate over the array. As characters are between 1 to 6 bytes long, one can convince oneself, that the execution time of such loop is actually Theta(N^2), which can be really slow even for moderately long texts. The larger the $i gets, the longer is the search for $i-th letter. Thus a loop which calls mb_substr($text,$i,1) N times for all possible N values of $i, will take much longer than expected. The problem with this is that there is no "magic" way to find $i-th character inside UTF-8 string, other than reading it byte by byte from the begining. Getting Started Introduction A simple tutorial Language Reference Basic syntax Types Variables Constants Expressions Operators Control Structures Functions Classes and Objects Namespaces Enumerations Errors Exceptions Fibers Generators Attributes References Explained Predefined Variables Predefined Exceptions Predefined Interfaces and Classes Predefined Attributes Context options and parameters Supported Protocols and Wrappers Security Introduction General considerations Installed as CGI binary Installed as an Apache module Session Security Filesystem Security Database Security Error Reporting User Submitted Data Hiding PHP Keeping Current Features HTTP authentication with PHP Cookies Sessions Dealing with XForms Handling file uploads Using remote files Connection handling Persistent Database Connections Command line usage Garbage Collection DTrace Dynamic Tracing Function Reference Affecting PHP's Behaviour Audio Formats Manipulation Authentication Services Command Line Specific Extensions Compression and Archive Extensions Cryptography Extensions Database Extensions Date and Time Related Extensions File System Related Extensions Human Language and Character Encoding Support Image Processing and Generation Mail Related Extensions Mathematical Extensions Non-Text MIME Output Process Control Extensions Other Basic Extensions Other Services Search Engine Extensions Server Specific Extensions Session Extensions Text Processing Variable and Type Related Extensions Web Services Windows Only Extensions XML Manipulation GUI Extensions Keyboard Shortcuts ? This help j Next menu item k Previous menu item g p Previous man page g n Next man page G Scroll to bottom g g Scroll to top g h Goto homepage g s Goto searchĪs you often need to iterate over UTF-8 characters inside a string, you might be tempted to use mb_substr($text,$i,1).
0 Comments
Leave a Reply. |
Details
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |