Computer Science, asked by jayatishah001, 4 months ago

What is the internal structure of Python string

Answers

Answered by AshwinJN
8

Explanation:

Python is an object oriented programming language like Java. Python is called an interpreted language. Python uses code modules that are interchangeable instead of a single long list of instructions that was standard for functional programming languages.

Answered by Anonymous
7

Answer:

This article describes how Python string interning works in CPython 2.7.7.

A few days ago, I had to explain to a colleague what the built-in function intern does. I gave him the following example:

>>> s1 = 'foo!'

>>> s2 = 'foo!'

>>> s1 is s2

False

>>> s1 = intern('foo!')

>>> s1

'foo!'

>>> s2 = intern('foo!')

>>> s1 is s2

True

You got the idea but… how does it work internally?

The PyStringObject structure

Let’s delve into CPython source code and take a look at PyStringObject, the C structure representing Python strings located in the file stringobject.h:

typedef struct {

PyObject_VAR_HEAD

long ob_shash;

int ob_sstate;

char ob_sval[1];

/* Invariants:

* ob_sval contains space for 'ob_size+1' elements.

* ob_sval[ob_size] == 0.

* ob_shash is the hash of the string or -1 if not computed yet.

* ob_sstate != 0 iff the string object is in stringobject.c's

* 'interned' dictionary; in this case the two references

* from 'interned' to this object are *not counted* in ob_refcnt.

*/

} PyStringObject;

According to this comment, the variable ob_sstate is different from 0 if and only if the string is interned. This variable is never accessed directly but always through the macro PyString_CHECK_INTERNED defined a few lines below:

#define PyString_CHECK_INTERNED(op) (((PyStringObject *)(op))->ob_sstate)

The interned dictionary

Then, let’s open stringobject.c. Line 24 declares a reference to an object where interned strings will be stored:

static PyObject *interned;

In fact, this object is a regular Python dictionary and is initialized line 4745:

interned = PyDict_New();

Finally, all the magic happens line 4732 in the PyString_InternInPlace function. The implementation is straightforward:

PyString_InternInPlace(PyObject **p)

{

register PyStringObject *s = (PyStringObject *)(*p);

PyObject *t;

if (s == NULL || !PyString_Check(s))

Py_FatalError("PyString_InternInPlace: strings only please!");

/* If it's a string subclass, we don't really know what putting

it in the interned dict might do. */

if (!PyString_CheckExact(s))

return;

if (PyString_CHECK_INTERNED(s))

return;

if (interned == NULL) {

interned = PyDict_New();

if (interned == NULL) {

PyErr_Clear(); /* Don't leave an exception */

return;

}

}

t = PyDict_GetItem(interned, (PyObject *)s);

if (t) {

Py_INCREF(t);

Py_DECREF(*p);

*p = t;

return;

}

if (PyDict_SetItem(interned, (PyObject *)s, (PyObject *)s) < 0) {

PyErr_Clear();

return;

}

/* The two references in interned are not counted by refcnt.

The string deallocator will take care of this */

Py_REFCNT(s) -= 2;

PyString_CHECK_INTERNED(s) = SSTATE_INTERNED_MORTAL;

}

As you can see, keys in the interned dictionary are pointers to string objects and values are the same pointers. Furthermore, string subclasses cannot be interned. Let me set aside error checking and reference counting and rewrite this function in pseudo Python code:

interned = None

def intern(string):

if string is None or not type(string) is str:

raise TypeError

if string.is_interned:

return string

if interned is None:

global interned

interned = {}

t = interned.get(string)

if t is not None:

return t

interned[string] = string

string.is_interned = True

return string.

Similar questions