Data Engineer Resume
Mason, OH
SUMMARY
- 5+ Years of overall IT experience which including 3+ years of comprehensive experience as Hadoop Developer, and related technologies.
- Capable of processing large sets of structured, semi - structured and unstructured data and supporting systems application architecture.
- Familiar with data architecture including data ingestion pipeline design, Hadoop information architecture, data modeling and data mining, machine learning and advanced data processing. Experience optimizing ETL workflows.
- Experience in working on Spark SQL queries, Data frames, and import data from Data sources, perform transformations, and perform read/write operations, save the results to output directory into HDFS.
- Have proficiency in MySQL, Oracle and have development experience in Procedures and SQL Queries.
- Extensive Knowledge in Apache Hadoop installation & configuration of multiple nodes on AWS EC2 system.
- Involved in file movements between HDFS and AWS S3 and extensively worked with S3 bucket in AWS.
- Wrote Python scripts to parse XML/JSON documents and load the data in database.
- Have experience and working knowledge of Agile Methodology.
- Expertise in developing various applications using IDE’s like Eclipse and NetBeans.
- Knowledge on creating Hadoop clusters on EC2, VM, CDH5 version Cloudera Manager on Linux, CentOS and Ubuntu.
TECHNICAL SKILLS
Big Data Ecosystems: Hadoop, MapReduce, HDFS, HBase, Hive, Sqoop, Cassandra
Programming Languages: Python, Java, C, Scala
Framework: Apache Hadoop, Apache Spark
Databases: Oracle, SQL, MySQL, PostgreSQL
Tools: Eclipse, JDeveloper, MS Visual Studio
Cloud: AWS
Version Control: GIT, SVN
PROFESSIONAL EXPERIENCE
Confidential, Mason, OH
Data Engineer
Responsibilities:
- Working on re-architecture for an established application with number of Stored Procedures in Sybase.
- Using Sqoop import to get data from Sybase to Hadoop and created tables on top of source tables.
- By getting business logic from Stored Procedures, created PySpark scripts to optimize the Job performance.
- Involved in code/design analysis, strategy development and project planning.
- With the Spark scripts created tables in Hive from data imported using Sqoop and wrote some queries for data validation using Hue.
- Also used bcp to get the production data in to Hadoop for final testing before the integration.
- Managed and reviewed Hadoop log files.
- Tested raw data and executed performance scripts.
- Used Jira for project tracking, Bug tracking and Project Management.
Environment: Python, CDH, HDFS, SPARK 2.0.0, Hive, Sqoop, Map Reduce, PyCharm, SQL, Sybase, JIRA.
Confidential
Big Data Engineer
Responsibilities:
- Worked on creating an automation system for the end users to get the required predictions for upcoming crop planting.
- Used big data from multiple sources like PostgreSQL, AWS Athena to create requied input datasets for the typical scientific algorithms to run.
- Responsible for creating tables in PostgreSQL with most accurate data as needed by the users.
- Successfully delivered the results on time for the Scientists to continue their research on seeds development.
- Responsible for creating an Trait Introgression Management System by following Agile methodology and involved in brainstorming with the team to come up with various solutions.
- Built the system in Couldera, created Spark scripts for the back-end data along with Sqoop jobs, using Hive, imapala for the database.
- Involved in migration of Trait Introgression system into AWS from Cloudera. Created Airflow nightly jobs in building tables with most recent for the UI.
- Worked on AWS services like EMR, EC2, SQS, cloudwatch, Glue, S3, Athena, Lambda, ELB and RDS.
- Using Git repository for code merging, maintaing code and queries to build back-end tables and daily jobs.
- Developed MapReduce programs to parse the raw data, populate staging tables and store the refined data in partitioned tables in the DWH.
- Worked on PySpark scripts to make required code changes, resolve bugs and development.
- Used Jira as ticketing tools for daily stories and involved in Sprint plannings.
- Responsible for building database for Inbred data for the business users in scope for an enterprise application.
- Working on ETL process of big data from AWS, with required changes and data quality checks from SME’s requirements.
- Wrote SQL queries, stored procedures, and triggers to perform back-end database operations.
Environment: Python, CDH, HDFS, SPARK 2.0.0, Hive, Impala, Sqoop, Map Reduce, PyCharm, Putty, Agile Methodology, JIRA
Confidential
Hadoop Developer
Responsibilities:
- Designed applications, contributing to all phases of SDLC, including analysis requirements, design, coding, testing, and requirement specification documentation.
- Used Git repository for code merging.
- Developed MapReduce programs to parse the raw data, populate staging tables and store the refined data in partitioned tables in the DWH.
- Created Hive queries that helped market analysts spot emerging trends by comparing fresh data with DWH reference tables and historical metrics.
- Performed Data Ingestion from multiple internal clients using Apache Kafka.
- Hands on experience in AWS provisioning and good knowledge of AWS services like EC2,S3,EMR,ELB,RDS, Lambda.
- Worked on Spark streaming using Apache Kafka for real time data processing.
- Experience in designing and implementing distributed data processing pipelines using Spark
- Wrote SQL queries, stored procedures, and triggers to perform back-end database operations.
- Shared responsibility for administration of Hadoop, Hive and Pig.
Environment: Hadoop, HDFS, Spark, MapReduce, Sqoop, Kafka, Cloudera CDH, SQL, Scrum, JIRA, Pycharm.
Confidential, Mountlake, Washington
Developer
Responsibilities:
- Worked on multiple projects, gathered requirements from business stakeholders and performed process analysis.
- Coordinated with client management and project team to identify and meet project objectives.
- Installed and configured MapReduce, HIVE and the HDFS; implemented CDH3 Hadoop cluster on CentOS. Assisted with performance tuning and monitoring.
- Created HBase tables to load large sets of structured, semi-structured and unstructured data coming from UNIX, NoSQL and a variety of portfolios.
- Supported code/design analysis, strategy development and project planning.
- Created reports for the BI team using Sqoop to export data into HDFS and Hive.
- Developed multiple MapReduce jobs in Java for data cleaning and preprocessing.
- Collaborated with the infrastructure, network, database, application and BI teams to ensure data quality and availability.
- Used JDBC to connect to backend databases, Oracle and SQL Server 2005.
Environment: Java 1.4, J2EE (JSP/Servlets, JDBC, XML), Eclipse, JSON, SQL, Sqoop, Hive, Hadoop.
Confidential
Software Engineer(Internship)
Responsibilities:
- Prepared use cases, designed and developed object models and class diagrams.
- Developed SQL statements to improve back-end communications.
- Incorporated custom logging mechanism for tracing errors, resolving all issues and bugs before deploying the application in the WebSphere Server.
- Developed user and technical documentation.
- Monitored and reported project status and risks using established metrics.
- Involved in fixing bugs and unit testing with test cases using JUnit.
- Involved in building and deploying scripts using Maven to generate war, ear and jar files.
- Worked extensively on Angular JS in creating controllers and also developed the service layer in order to retrieve the data from the database.
Environment: Java, XML, Oracle 9i, Eclipse, SQL, Junit