Data Engineer Resume

SUMMARY

Around 7 years of professional experience as a Data Engineer in designing data - intensive applications using Hadoop Ecosystem, Big Data Analytical, Cloud Data Engineering, Data Warehousing/Data Mart or OLAP Cubes, Data Visualization, Reporting, SQL
Knowledge of handling Structured(tabular) as well as Unstructured Datasets (NoSQL)
Proficient experience in Spark with Python, Sqoop, Hive, Kafka
Profound Knowledge in writing SQL queries and Shell Scripts
Practical understanding of teh Data modeling (Dimensional & Relational) concepts like ER modeling, Star-Schema Modeling, Snowflake Schema Modeling, Fact, and Dimensional Modeling
Extensive AWS experience including AWS services such as EMR, EC2, IAM, Lambda, S3
Hands-on experience in Bash scripting and building data pipelines.
Strong Knowledge in data preparation, data modeling, and data visualization using Power Bi and Tableau and developed reports and Dashboards in them as well.
Experience in using Sqoop for importing and exporting data from relational database management systems to HIVE and HDFS
Knowledge of various file formats in HDFS like Avro, orc, parquet
Excellent Communication and interpersonal skills and keen on shifting to new technologies
Experience with partitioning and bucketing concepts in Hive and designed as well as managed External tables in Hive to optimize performance
Extensive Knowledge in using Python Packages like Pandas, Numpy, Matplotlib, and sklearn to derive relationships in teh data and using concepts of SQL, Hive SQL, and Pyspark mlib hand in hand with teh above to achieve reduced processing times for teh reports to be generated
Building and productionizing predictive models on large datasets by utilizing advanced statistical modeling, machine learning, or other data mining techniques.
Hands-on experience with different programming languages like Python, SAS, C++, Java, C
In-depth noledge of Hadoop architecture and its components like YARN, HDFS, Name Node, Data Node, Job Tracker, Application Master, Resource Manager, Task Tracker, and Map Reduce programming paradigm.
Strong Knowledge of Architecture of Distributed Systems and Parallel processing, In-depth understanding of MapReduce programming paradigm and Spark execution framework.
Expert in designing Parallel jobs using various stages like Join, Merge, Lookup, remove duplicates, Filter, Dataset, Lookup file set, Complex flat file, Modify, Aggregator, XML.
Experienced in Scraping data from teh web by writing python scripts with beautiful soup and selenium web driver for dynamic webpage scraping.
Experienced with teh Spark improving teh performance and optimization of teh existing algorithms in Hadoop using Spark Context, Spark-SQL, Data frame API, Spark Streaming, Pair RDDs, and worked explicitly on PySpark and Scala.

TECHNICAL SKILLS

Programming: Python, R, SAS, Java, Shell, C, C++

Big Data Technologies: Hadoop, MapReduce, HDFS, Spark, Hive, Sqoop, Kafka, Mahout, Sparklib

Databases: MySQL, Oracle, SQL Server, MongoDB, Casandra, Postgres SQL

Cloud Technologies: Google BigQuery, AWS (S3, EC2), Microsoft Azure

Reporting: Power BI, Tableau

Tools: PyCharm, Eclipse, SQL Server Management Studio, Spyder, Access, Erwin

Operating Systems: Windows 7/8/XP/2011/Vista, Ubuntu, MacOs

We provide IT Staff Augmentation Services!

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship