Data Engineer Resume Mason, OH - Hire IT People

SUMMARY

5+ Years of overall IT experience which including 3+ years of comprehensive experience as Hadoop Developer, and related technologies.
Capable of processing large sets of structured, semi - structured and unstructured data and supporting systems application architecture.
Familiar with data architecture including data ingestion pipeline design, Hadoop information architecture, data modeling and data mining, machine learning and advanced data processing. Experience optimizing ETL workflows.
Experience in working on Spark SQL queries, Data frames, and import data from Data sources, perform transformations, and perform read/write operations, save the results to output directory into HDFS.
Have proficiency in MySQL, Oracle and have development experience in Procedures and SQL Queries.
Extensive Knowledge in Apache Hadoop installation & configuration of multiple nodes on AWS EC2 system.
Involved in file movements between HDFS and AWS S3 and extensively worked with S3 bucket in AWS.
Wrote Python scripts to parse XML/JSON documents and load the data in database.
Have experience and working knowledge of Agile Methodology.
Expertise in developing various applications using IDE’s like Eclipse and NetBeans.
Knowledge on creating Hadoop clusters on EC2, VM, CDH5 version Cloudera Manager on Linux, CentOS and Ubuntu.

TECHNICAL SKILLS

Big Data Ecosystems: Hadoop, MapReduce, HDFS, HBase, Hive, Sqoop, Cassandra

Programming Languages: Python, Java, C, Scala

Framework: Apache Hadoop, Apache Spark

Databases: Oracle, SQL, MySQL, PostgreSQL

Tools: Eclipse, JDeveloper, MS Visual Studio

Cloud: AWS

Version Control: GIT, SVN

PROFESSIONAL EXPERIENCE

Confidential, Mason, OH

Data Engineer

Responsibilities:

Working on re-architecture for an established application with number of Stored Procedures in Sybase.
Using Sqoop import to get data from Sybase to Hadoop and created tables on top of source tables.
By getting business logic from Stored Procedures, created PySpark scripts to optimize the Job performance.
Involved in code/design analysis, strategy development and project planning.
With the Spark scripts created tables in Hive from data imported using Sqoop and wrote some queries for data validation using Hue.
Also used bcp to get the production data in to Hadoop for final testing before the integration.
Managed and reviewed Hadoop log files.
Tested raw data and executed performance scripts.
Used Jira for project tracking, Bug tracking and Project Management.

Environment: Python, CDH, HDFS, SPARK 2.0.0, Hive, Sqoop, Map Reduce, PyCharm, SQL, Sybase, JIRA.

Confidential

Big Data Engineer

Responsibilities:

Worked on creating an automation system for the end users to get the required predictions for upcoming crop planting.
Used big data from multiple sources like PostgreSQL, AWS Athena to create requied input datasets for the typical scientific algorithms to run.
Responsible for creating tables in PostgreSQL with most accurate data as needed by the users.
Successfully delivered the results on time for the Scientists to continue their research on seeds development.
Responsible for creating an Trait Introgression Management System by following Agile methodology and involved in brainstorming with the team to come up with various solutions.
Built the system in Couldera, created Spark scripts for the back-end data along with Sqoop jobs, using Hive, imapala for the database.
Involved in migration of Trait Introgression system into AWS from Cloudera. Created Airflow nightly jobs in building tables with most recent for the UI.
Worked on AWS services like EMR, EC2, SQS, cloudwatch, Glue, S3, Athena, Lambda, ELB and RDS.
Using Git repository for code merging, maintaing code and queries to build back-end tables and daily jobs.
Developed MapReduce programs to parse the raw data, populate staging tables and store the refined data in partitioned tables in the DWH.
Worked on PySpark scripts to make required code changes, resolve bugs and development.
Used Jira as ticketing tools for daily stories and involved in Sprint plannings.
Responsible for building database for Inbred data for the business users in scope for an enterprise application.
Working on ETL process of big data from AWS, with required changes and data quality checks from SME’s requirements.
Wrote SQL queries, stored procedures, and triggers to perform back-end database operations.

Environment: Python, CDH, HDFS, SPARK 2.0.0, Hive, Impala, Sqoop, Map Reduce, PyCharm, Putty, Agile Methodology, JIRA

Confidential

Hadoop Developer

Responsibilities:

Designed applications, contributing to all phases of SDLC, including analysis requirements, design, coding, testing, and requirement specification documentation.
Used Git repository for code merging.
Developed MapReduce programs to parse the raw data, populate staging tables and store the refined data in partitioned tables in the DWH.
Created Hive queries that helped market analysts spot emerging trends by comparing fresh data with DWH reference tables and historical metrics.
Performed Data Ingestion from multiple internal clients using Apache Kafka.
Hands on experience in AWS provisioning and good knowledge of AWS services like EC2,S3,EMR,ELB,RDS, Lambda.
Worked on Spark streaming using Apache Kafka for real time data processing.
Experience in designing and implementing distributed data processing pipelines using Spark
Wrote SQL queries, stored procedures, and triggers to perform back-end database operations.
Shared responsibility for administration of Hadoop, Hive and Pig.

Environment: Hadoop, HDFS, Spark, MapReduce, Sqoop, Kafka, Cloudera CDH, SQL, Scrum, JIRA, Pycharm.

Confidential, Mountlake, Washington

Developer

Responsibilities:

Worked on multiple projects, gathered requirements from business stakeholders and performed process analysis.
Coordinated with client management and project team to identify and meet project objectives.
Installed and configured MapReduce, HIVE and the HDFS; implemented CDH3 Hadoop cluster on CentOS. Assisted with performance tuning and monitoring.
Created HBase tables to load large sets of structured, semi-structured and unstructured data coming from UNIX, NoSQL and a variety of portfolios.
Supported code/design analysis, strategy development and project planning.
Created reports for the BI team using Sqoop to export data into HDFS and Hive.
Developed multiple MapReduce jobs in Java for data cleaning and preprocessing.
Collaborated with the infrastructure, network, database, application and BI teams to ensure data quality and availability.
Used JDBC to connect to backend databases, Oracle and SQL Server 2005.

Environment: Java 1.4, J2EE (JSP/Servlets, JDBC, XML), Eclipse, JSON, SQL, Sqoop, Hive, Hadoop.

Confidential

Software Engineer(Internship)

Responsibilities:

Prepared use cases, designed and developed object models and class diagrams.
Developed SQL statements to improve back-end communications.
Incorporated custom logging mechanism for tracing errors, resolving all issues and bugs before deploying the application in the WebSphere Server.
Developed user and technical documentation.
Monitored and reported project status and risks using established metrics.
Involved in fixing bugs and unit testing with test cases using JUnit.
Involved in building and deploying scripts using Maven to generate war, ear and jar files.
Worked extensively on Angular JS in creating controllers and also developed the service layer in order to retrieve the data from the database.

Environment: Java, XML, Oracle 9i, Eclipse, SQL, Junit

We provide IT Staff Augmentation Services!

Data Engineer Resume

Mason, OH

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship