Strings Are Not char Pointers

Reading time: 5 minute Word count: 1005

Rust C Strings UTF-8 FFI

C char * is powerful and vague.

It may mean mutable bytes, a NUL-terminated string, ASCII text, or simply the argument type required by a C API. The caller also has to know who frees the memory, whether it contains \0, what encoding it uses, and where the length comes from.

Rust does not put all of that into one type. It splits the common cases:

one byte: u8
bytes: &[u8] / Vec<u8>
UTF-8 text view: &str
owned UTF-8 text: String
C string view: &CStr
owned C string: CString

This article builds the first boundary: bytes, text, and C strings are different things.

C char * Carries Too Many Meanings

Consider a common C function:

void log_name(const char *name)
{
    printf("name=%s\n", name);
}

The signature hides many rules:

name must not be NULL
name points to a NUL-terminated string
it must not contain an earlier \0
%s reads until the first \0
bytes are interpreted as some text encoding
the function does not own the memory

None of that is in const char *. It is just a pointer.

If the same char * is used for binary data, the rules are different. Binary data may contain zero bytes, must carry a separate length, and must not be printed with %s.

Rust’s string types first solve this problem: bytes and text are not the same default type.

Bytes Are u8

As covered earlier, Rust char is not C char. Rust char is a Unicode scalar value, not a byte.

For raw bytes, use u8:

let byte: u8 = b'A';

For binary data, use &[u8] or Vec<u8>:

fn parse_frame(input: &[u8]) -> Result<(), &'static str> {
    if input.len() < 2 {
        return Err("frame too short");
    }

    Ok(())
}

If data comes from a device, a file, a network packet, compressed bytes, or encrypted bytes, treat it as bytes first. Do not turn it into a string until the encoding boundary is clear.

That is the same engineering distinction C developers already make between uint8_t *buf, size_t len and char *text. Rust just makes the types more explicit.

&str Is a UTF-8 Text View

Rust &str is a UTF-8 text view:

fn greet(name: &str) {
    println!("hello, {name}");
}

You can first read it as:

&str = view over UTF-8 text = pointer + length + UTF-8 validity

It is not a NUL-terminated string, and it is not a raw pointer. Because it has a length, it may contain \0:

let text = "abc\0def";
println!("{}", text.len());

That is completely different from C %s. A C string ends at the first NUL; Rust &str represents text by length, and NUL is just one byte inside the text.

But &str has a requirement: it must contain valid UTF-8. Arbitrary bytes cannot be treated as &str.

String Owns UTF-8 Text

String is owned, growable UTF-8 text:

let mut name = String::from("device");
name.push_str("-01");

Compare it with Vec<u8>:

Vec<u8>  owns growable bytes
String   owns growable UTF-8 text

String can be borrowed as &str:

fn print_name(name: &str) {
    println!("{name}");
}

let name = String::from("sensor");
print_name(&name);

If a function only reads text, prefer accepting &str instead of forcing callers to pass String:

fn is_valid_topic(topic: &str) -> bool {
    !topic.is_empty() && !topic.contains(' ')
}

Callers can pass a string literal or a borrowed String. The function does not take ownership.

Bytes to Text Can Fail

C code often treats char * as text directly. Rust makes the encoding boundary explicit.

To convert bytes to &str, check UTF-8:

fn parse_name(input: &[u8]) -> Result<&str, std::str::Utf8Error> {
    std::str::from_utf8(input)
}

If the input is not valid UTF-8, this returns an error.

That matters for device and protocol work. Not all bytes are text:

serial frames may be binary protocols
file names may cross OS encoding boundaries
network payloads may be compressed or encrypted
strings returned by C APIs may not be UTF-8

Rust is not making strings complicated. It is exposing an encoding assumption that C code often leaves implicit.

C Boundaries Use CStr and CString

When interacting with C APIs, Rust has C string types.

For a NUL-terminated string received from C, use a CStr view:

use std::ffi::CStr;
use std::os::raw::c_char;

unsafe fn print_from_c(ptr: *const c_char) {
    let c_str = CStr::from_ptr(ptr);
    println!("{:?}", c_str);
}

To pass a Rust string to C, use CString:

use std::ffi::CString;

let name = CString::new("device-01").expect("no interior nul");
let ptr = name.as_ptr();

CString::new checks that there is no interior NUL. C strings use NUL as the terminator, so an interior NUL would make C stop early.

There is also an ownership rule: ptr points into the CString. It is valid only while the CString is alive. If C stores the pointer, the API needs explicit lifetime and release rules.

That is an FFI topic for later. The first rule is simple: use CStr / CString at C string boundaries instead of forcing String into that role.

Ask What the Data Is

When writing a Rust API, do not ask first: “C used char *; what is the Rust replacement?” Ask what the data actually is.

For binary data:

fn parse_payload(payload: &[u8]) {}

For read-only UTF-8 text:

fn parse_topic(topic: &str) {}

For creating and returning text:

fn make_topic(device: &str) -> String {
    format!("devices/{device}/state")
}

For a C API’s NUL-terminated string:

// &CStr / CString

This makes API boundaries clearer than using String or Vec<u8> everywhere.

What to Keep from This Article

C char * does not have one Rust replacement:

char byte / unsigned char byte  -> u8
char * + len as binary data     -> &[u8] / Vec<u8>
valid UTF-8 text view           -> &str
owned UTF-8 text                -> String
const char * from C             -> &CStr
owned C string for C API        -> CString

The core distinction is:

bytes are not necessarily text
text is not a C string
C strings are not Rust String

Rust splits these boundaries so encoding, length, NUL termination, and ownership do not all hide inside one char *.

The next article moves to error handling. C return codes, errno, output parameters, and cleanup paths map to Rust Result, ?, and error types.