Data Engineer Resume TX - Hire IT People

SUMMARY

5 years of experience in developing application using technologies like Hadoop, Spark, Hive and Sqoop
Experience in developing Spark jobs in Python to process huge data as per the requirement
Worked on PySpark and Spark - shell scripts for processing large dataset
Experienced in analyzing large datasets and understood the concept of finding patterns and insights within structured, semi structured and unstructured data
Experience in designing and developing applications in Spark using Python to compare the performance of Spark with Hive and SQL
Experience with Google Cloud Platform services like Data Proc and Buckets to Plan, Configure, Deploy and Operate a cloud solution
Worked on performing reads/writes from Spark to MongoDB and vice versa
Have good experience in working with the Spark ecosystem using Spark SQL and Scala queries on different formats like Text file, CSV, Avro, Sequence file and XML files
Good in querying mongo and wrote a program to convert the mongo collections into hive tables
Experienced in cluster planning, designing, deploying, performance tuning and monitoring Hadoop ecosystem
Hadoop performed advanced analytical application by making use of Spark with Hive and SQL
Excellent Programming skills at a higher level of abstraction using Scala and Python
Experience in developing Spark applications using Spark tools like RDD transformations, Spark core, Spark Streaming and Spark SQL
Created Hive tables to store structured data into HDFS and processed it using HiveQL
Good experience in optimizing Map Reduce algorithms using Mappers, Reducers and partitioned to deliver the best results for the large datasets
In depth understanding/knowledge of Hadoop Architecture
Experienced in Software Design, Development and Implementation of Client/Server Web based
Proven ability to manage all stages of project development Strong Problem Solving and Analytical skills and abilities to make Balanced and Independent Decisions
Wrote Spark programs to perform data cleansing, transformation, and joins
Commitment to development best practices including coding, naming convention, commenting, code modularization, reuse and possesses strong analytical / problem solving skills
Decisive problem solver able to execute innovative solutions and process improvement to meet defined business goals
Good knowledge on the SDLC process

TECHNICAL SKILLS

Bigdata Technologies: HDFS, Hive, Sqoop, Spark, Zookeeper, Map-Reduce, Spark-SQL

Programming Languages: Scala, Python, HiveQL, PL/SQL

Operating Systems: Windows, Linux

RDBMS: MySQL, SQL Server

NOSQL Databases: MongoDB

Database Tools: SQL Developer, Robo3T, PyCharm

PROFESSIONAL EXPERIENCE

Confidential, TX

Data Engineer

Responsibilities:

Develop and Engineer data from various data sources such as SQL Server, MySQL to transform the existing raw metadata to the reporting requirement needs using Spark, Kafka, Python, and Hive
Involved in complete Big Data flow of the application starting from data ingestion from upstream to HDFS, processing the data in HDFS and analyzing the data.
Responsible for building scalable distributed data solutions using Spark
Involved in creating Hive tables, loading data and writing hive queries which will run internally as Map Reduce jobs
Creating Hive external tables and partitioned tables using Hive Index and used HQL to make ease of data analytics.
Implemented partitioning, dynamic partition buckets in Hive for optimization of queries
Wrote Spark programs to perform data cleansing, transformation and joins.
Wrote ETL jobs using Spark and worked on tuning the performance of Hive queries.
Design and Developed spark jobs with python to implement end to end data pipeline for batch processing and used Spark UI to monitor job processing
Developed Sqoop scripts and Sqoop jobs to inject data from client provided database in batch fashion on incremental basis
Wrote python program for Spark transformation of data frames and RDD
Worked on storing data in MongoDB collection
Analyzed various types of files like ORC, Parquet, json, csv, xml with python in Spark Submit jobs and spark submission executed in Dataproc cluster
Experience in development of full life cycle implementation of ETL using SQL Server and helped with designing the Data warehouse by defining Facts, Dimensions, and relationships between them and applied the Corporate Standards in Naming Conventions
Worked closely with the QA and Prod support team by providing components, documentation, validation and Knowledge transfer on new projects and debugging on issues

Environment: Hadoop Ecosystem (HDFS, Hive, Sqoop), Spark, Scala, Python, MongoDB, Linux OS, Windows OS, Google Cloud Platform (Data Proc and Buckets)

Confidential

Data Engineer

Responsibilities:

Designed, planned, and developed programs to perform automated extract, transform and load data between data sources when working with large data sets.
Responsible for building scalable distributed solution using Hadoop
Developed and built data pipelines for deploying Hadoop applications and assist the team to manage compliance to the standards with the organization’s Hadoop ecosystem.
Transformed, Analyzed and validated large datasets and built automated and optimized pipelines.
Experience in Partitions, bucketing concepts in Hive and designed both Managed and External tables in Hive to optimize performance
Worked on Big Data infrastructure for batch processing and real-time processing
Responsible for building scalable distributed data solutions using the Hadoop ecosystem
Responsible for interpreting the requirements of Big Data Analytic Use Cases and Scenarios and driving the design and implementation of specific data models to ultimately help better business decisions
Experience in manipulating/analyzing large datasets and finding patterns and insights within structured and unstructured data
Experienced in working with the spark ecosystem using Spark SQL and Scala queries on different formats like Text file, CSV file.
Developed Spark scripts by using Python shell commands as per the requirement
Worked with Prod support team by providing components, documentation, validation and Knowledge transfer on new projects and debugging on issues

Environment: Hadoop, Hive, Sqoop, HDFS, HBase, MySQL, MongoDB

We provide IT Staff Augmentation Services!

Data Engineer Resume

TX

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship