We provide IT Staff Augmentation Services!

Data Engineer Resume

2.00/5 (Submit Your Rating)

Mason, OH

SUMMARY

  • 5+ Years of overall IT experience which including 3+ years of comprehensive experience as Hadoop Developer, and related technologies.
  • Capable of processing large sets of structured, semi - structured and unstructured data and supporting systems application architecture.
  • Familiar with data architecture including data ingestion pipeline design, Hadoop information architecture, data modeling and data mining, machine learning and advanced data processing. Experience optimizing ETL workflows.
  • Experience in working on Spark SQL queries, Data frames, and import data from Data sources, perform transformations, and perform read/write operations, save the results to output directory into HDFS.
  • Have proficiency in MySQL, Oracle and have development experience in Procedures and SQL Queries.
  • Extensive Knowledge in Apache Hadoop installation & configuration of multiple nodes on AWS EC2 system.
  • Involved in file movements between HDFS and AWS S3 and extensively worked with S3 bucket in AWS.
  • Wrote Python scripts to parse XML/JSON documents and load the data in database.
  • Have experience and working knowledge of Agile Methodology.
  • Expertise in developing various applications using IDE’s like Eclipse and NetBeans.
  • Knowledge on creating Hadoop clusters on EC2, VM, CDH5 version Cloudera Manager on Linux, CentOS and Ubuntu.

TECHNICAL SKILLS

Big Data Ecosystems: Hadoop, MapReduce, HDFS, HBase, Hive, Sqoop, Cassandra

Programming Languages: Python, Java, C, Scala

Framework: Apache Hadoop, Apache Spark

Databases: Oracle, SQL, MySQL, PostgreSQL

Tools: Eclipse, JDeveloper, MS Visual Studio

Cloud: AWS

Version Control: GIT, SVN

PROFESSIONAL EXPERIENCE

Confidential, Mason, OH

Data Engineer

Responsibilities:

  • Working on re-architecture for an established application with number of Stored Procedures in Sybase.
  • Using Sqoop import to get data from Sybase to Hadoop and created tables on top of source tables.
  • By getting business logic from Stored Procedures, created PySpark scripts to optimize the Job performance.
  • Involved in code/design analysis, strategy development and project planning.
  • With the Spark scripts created tables in Hive from data imported using Sqoop and wrote some queries for data validation using Hue.
  • Also used bcp to get the production data in to Hadoop for final testing before the integration.
  • Managed and reviewed Hadoop log files.
  • Tested raw data and executed performance scripts.
  • Used Jira for project tracking, Bug tracking and Project Management.

Environment: Python, CDH, HDFS, SPARK 2.0.0, Hive, Sqoop, Map Reduce, PyCharm, SQL, Sybase, JIRA.

Confidential

Big Data Engineer

Responsibilities:

  • Worked on creating an automation system for the end users to get the required predictions for upcoming crop planting.
  • Used big data from multiple sources like PostgreSQL, AWS Athena to create requied input datasets for the typical scientific algorithms to run.
  • Responsible for creating tables in PostgreSQL with most accurate data as needed by the users.
  • Successfully delivered the results on time for the Scientists to continue their research on seeds development.
  • Responsible for creating an Trait Introgression Management System by following Agile methodology and involved in brainstorming with the team to come up with various solutions.
  • Built the system in Couldera, created Spark scripts for the back-end data along with Sqoop jobs, using Hive, imapala for the database.
  • Involved in migration of Trait Introgression system into AWS from Cloudera. Created Airflow nightly jobs in building tables with most recent for the UI.
  • Worked on AWS services like EMR, EC2, SQS, cloudwatch, Glue, S3, Athena, Lambda, ELB and RDS.
  • Using Git repository for code merging, maintaing code and queries to build back-end tables and daily jobs.
  • Developed MapReduce programs to parse the raw data, populate staging tables and store the refined data in partitioned tables in the DWH.
  • Worked on PySpark scripts to make required code changes, resolve bugs and development.
  • Used Jira as ticketing tools for daily stories and involved in Sprint plannings.
  • Responsible for building database for Inbred data for the business users in scope for an enterprise application.
  • Working on ETL process of big data from AWS, with required changes and data quality checks from SME’s requirements.
  • Wrote SQL queries, stored procedures, and triggers to perform back-end database operations.

Environment: Python, CDH, HDFS, SPARK 2.0.0, Hive, Impala, Sqoop, Map Reduce, PyCharm, Putty, Agile Methodology, JIRA

Confidential

Hadoop Developer

Responsibilities:

  • Designed applications, contributing to all phases of SDLC, including analysis requirements, design, coding, testing, and requirement specification documentation.
  • Used Git repository for code merging.
  • Developed MapReduce programs to parse the raw data, populate staging tables and store the refined data in partitioned tables in the DWH.
  • Created Hive queries that helped market analysts spot emerging trends by comparing fresh data with DWH reference tables and historical metrics.
  • Performed Data Ingestion from multiple internal clients using Apache Kafka.
  • Hands on experience in AWS provisioning and good knowledge of AWS services like EC2,S3,EMR,ELB,RDS, Lambda.
  • Worked on Spark streaming using Apache Kafka for real time data processing.
  • Experience in designing and implementing distributed data processing pipelines using Spark
  • Wrote SQL queries, stored procedures, and triggers to perform back-end database operations.
  • Shared responsibility for administration of Hadoop, Hive and Pig.

Environment: Hadoop, HDFS, Spark, MapReduce, Sqoop, Kafka, Cloudera CDH, SQL, Scrum, JIRA, Pycharm.

Confidential, Mountlake, Washington

Developer

Responsibilities:

  • Worked on multiple projects, gathered requirements from business stakeholders and performed process analysis.
  • Coordinated with client management and project team to identify and meet project objectives.
  • Installed and configured MapReduce, HIVE and the HDFS; implemented CDH3 Hadoop cluster on CentOS. Assisted with performance tuning and monitoring.
  • Created HBase tables to load large sets of structured, semi-structured and unstructured data coming from UNIX, NoSQL and a variety of portfolios.
  • Supported code/design analysis, strategy development and project planning.
  • Created reports for the BI team using Sqoop to export data into HDFS and Hive.
  • Developed multiple MapReduce jobs in Java for data cleaning and preprocessing.
  • Collaborated with the infrastructure, network, database, application and BI teams to ensure data quality and availability.
  • Used JDBC to connect to backend databases, Oracle and SQL Server 2005.

Environment: Java 1.4, J2EE (JSP/Servlets, JDBC, XML), Eclipse, JSON, SQL, Sqoop, Hive, Hadoop.

Confidential

Software Engineer(Internship)

Responsibilities:

  • Prepared use cases, designed and developed object models and class diagrams.
  • Developed SQL statements to improve back-end communications.
  • Incorporated custom logging mechanism for tracing errors, resolving all issues and bugs before deploying the application in the WebSphere Server.
  • Developed user and technical documentation.
  • Monitored and reported project status and risks using established metrics.
  • Involved in fixing bugs and unit testing with test cases using JUnit.
  • Involved in building and deploying scripts using Maven to generate war, ear and jar files.
  • Worked extensively on Angular JS in creating controllers and also developed the service layer in order to retrieve the data from the database.

Environment: Java, XML, Oracle 9i, Eclipse, SQL, Junit

We'd love your feedback!