[swift-users] What are these types with regular expressions?

Brent Royal-Gordon brent at architechies.com
Sun Aug 7 02:27:12 CDT 2016


> On Aug 6, 2016, at 5:25 AM, 晓敏 褚 via swift-users <swift-users at swift.org> wrote:
> 
> And when I try to use range to get a substring, I got a Range<Int>, but the substring:with: method requies a Range<Index>. But there is no way I could find any information about the type(or protocol?) Index, and passing a Int fails.
> What are they, and how can I work with them?

"The Swift Programming Language" discusses this in more detail, but briefly: String indexing is much more complicated than NSString might make you think. For instance, the character 𠀋 is spread across two "indices", because it is in the Supplementary Ideographic Plane of Unicode. Moreover, there are actually several different mechanisms that can make a single "character" actually take up multiple indices. To model this, a Swift String offers several views (`characters`, `unicodeScalars`, `utf16`, and `utf8`), each of which handles indices in a different way. In Swift 2, each of these has its own `Index` type; I believe the plan was for Swift 3 to use one Index type shared between all views, but I'm not sure if that change will make the release version.

`NSString`, on the other hand, uses bare `Int`s interpreted a UTF-16 indices. So the way to convert is to translate the `Int` into a `String.UTF16Index`, and then if you want to go from there, further translate the `UTF16Index` into `String.Index`. (This second step can fail if, for instance, the `UTF16Index` points to the second index within 𠀋.) You can do that with an extension like this one:

	// Swift 3:
	extension String.UTF16View {
		func convertedIndex(_ intIndex: Int) -> Index {
			return index(startIndex, offsetBy: intIndex)
		}
		func convertedRange(_ intRange: Range<Int>) -> Range<Index> {
			let lower = convertedIndex(intRange.lowerBound)
			let offset = intRange.upperBound - intRange.lowerBound
			let upper = index(lower, offsetBy: offset)
			
			return lower ..< upper
		}
	}
	extension String {
		func convertedIndex(_ intIndex: Int) -> Index? {
			let utfIndex = utf16.convertedIndex(intIndex)
			return utfIndex.samePosition(in: self)
		}
		func convertedRange(_ intRange: Range<Int>) -> Range<Index>? {
			let utfRange = utf16.convertedRange(intRange)
			guard let lower = utfRange.lowerBound.samePosition(in: self),
				let upper = utfRange.upperBound.samePosition(in: self) else {
				return nil
			}
			return lower ..< upper
		}
	}

	// Swift 2:
	extension String.UTF16View {
		func convertedIndex(intIndex: Int) -> Index {
			return startIndex.advancedBy(intIndex)
		}
		func convertedRange(intRange: Range<Int>) -> Range<Index> {
			let lower = convertedIndex(intRange.startIndex)
			let offset = intRange.endIndex - intRange.startIndex
			let upper = lower.advancedBy(offset)
			
			return lower ..< upper
		}
	}
	extension String {
		func convertedIndex(intIndex: Int) -> Index? {
			let utfIndex = utf16.convertedIndex(intIndex)
			return utfIndex.samePositionIn(self)
		}
		func convertedRange(intRange: Range<Int>) -> Range<Index>? {
			let utfRange = utf16.convertedRange(intRange)
			guard let lower = utfRange.startIndex.samePositionIn(self),
				let upper = utfRange.startIndex.samePositionIn(self) else {
				return nil
			}
			return lower ..< upper
		}
	}

Use it like this:

	let range: Range<Int> = …
	
	// If you want to use String.UTF16Index:
	let convertedRange = string.utf16.convertedRange(range)
	print(string.utf16[convertedRange])
	
	// If you want to use String.Index:
	if let convertedRange = string.convertedRange(range) {
		print(string[convertedRange])
	}
	else {
		print("[Invalid range]")
	}

Hope this helps,
-- 
Brent Royal-Gordon
Architechies



More information about the swift-users mailing list