Introduction:
Elasticsearch, a distributed search and analytics engine, coupled with Python, offers a powerful solution for indexing and searching large volumes of data efficiently. In this comprehensive guide, we’ll explore how to leverage the Elasticsearch Python client to connect to both Elasticsearch Cloud and local instances, create indices, define custom mappings, ingest data from a CSV file, and perform queries. We’ll illustrate each step with practical examples, demonstrating how Elasticsearch can seamlessly integrate into Python applications.
Connecting to Elasticsearch:
The first step in working with Elasticsearch is establishing a connection. Using the Elasticsearch Python client, we can connect to both Elasticsearch Cloud and local instances effortlessly.
rom elasticsearch import Elasticsearch
ENDPOINT="http://localhost:9200"
## to conenct via username password use this
#es = Elasticsearch(hosts=[ENDPOINT], http_auth=(USERNAME, PASSWORD))
es = Elasticsearch(hosts=[ENDPOINT])
#checking if elastic search is connected
es.ping()
Creating an Index and Custom Mappings:
Indices are containers for storing documents in Elasticsearch, and mappings define the structure of these documents. Let’s create an index named “my_index” and define a custom mapping.
#Index Schema
indexMapping={
"properties":{
"id":{
"type":"long"
},
"Book":{
"type":"keyword"
},
"Page_No":{
"type":"long"
},
"Part":{
"type":"text"
},
"Chapter":{
"type":"text"
},
"Sub_Chapter":{
"type":"text"
},
"Article_No":{
"type":"keyword"
},
"Clause_No":{
"type":"keyword"
}
"Text_vector":{
"type":"dense_vector",
"dims":768,
"index":True,
"similarity":"l2_norm"
}
}
}
es.indices.create(index=index_name,settings=indexSettings, mappings=indexMapping)
import pandas as pd
df=pd.read_csv("constitution.csv", index_col=False)
df.rename(columns={'Unnamed: 0': 'id'}, inplace=True)
df.head()
record_list=df.to_dict("records")
#pushing data to elastic search index
for record in record_list:
try:
es.index(index=index_name,document=record)
except Exception as e:
print(e)
ook="Anti Terrorism Act 1997"
# Execute the search
result = es.search(index=index_name, query = {
"match": {
"Book": book}
})
print(result)
# Print the results
for hit in result['hits']['hits']:
print(hit['_source']['Textual_Metadata'])
You can write and design queries according to your needs and data
Conclusion:
In this blog post, we’ve explored the process of utilizing Elasticsearch with Python for efficient data indexing and searching. We’ve covered connecting to Elasticsearch instances, creating indices with custom mappings, ingesting data from CSV files, and executing queries to retrieve relevant information.
By harnessing the power of Elasticsearch alongside Python, developers can build robust search functionalities into their applications, enabling seamless integration with structured data sources like CSV files. Whether working with Elasticsearch Cloud or local instances, the Elasticsearch Python client provides a versatile and intuitive interface, empowering developers to leverage Elasticsearch’s capabilities effectively.
Rawaha Javed
Associate Consultant