We provide IT Staff Augmentation Services!

Hadoop/ Big Data Engineer Resume

Minneapolis, MN


  • 5 years of experience in software development specializing in Data Modeling, ETL, SQL, and NoSQL areas.
  • Expert in working on all activities related to the development, implementation, administration and support of ETL processes for large - scale data warehouses using SQL Server SSIS and Talend.
  • Hands on experience with Hadoop stack (MapReduce, HDFS, Sqoop, Pig, Hive, HBase, Spark, Flume, Oozie and ZooKeeper).
  • Well versed in configuring and administering the Hadoop Cluster using Cloudera.
  • Proven Expertise in performing analytics on Big Data using Map Reduce, Hive and Pig.
  • Experienced with performing real time analytics on No SQL databases like H Base and Cassandra .
  • Worked with Oozie workflow engine to schedule time based jobs to perform multiple actions.
  • Strong understanding of Software Development Lifecycle (SDLC - various methodologies Waterfall, Agile ).
  • Good experience in developing projects using TALEND studio for Big data.
  • Developed Hive/MapReduce/Spark Python modules for ML & predictive analytics in Hadoop/Hive/Hue on AWS.
  • Experience in importing and exporting the different formats of data into HDFS, HBASE from different RDBMS databases and vice versa.
  • Developed PIG Latin scripts for handling business transformations.
  • Implemented SQOOP for large dataset transfer between Hadoop and RDBMs .
  • Excellent working knowledge of SSAS, SSRS and Crystal Reports
  • Expert in Creating, configuring and fine tuning ETL workflows designed in SSIS and Talend
  • Excellent experience in using scheduling tools to automate batch jobs
  • Experience in using data modeling tools like Erwin and Sybase Power Designer
  • Extensive experience in using SQL.
  • Excellent Database experience in PostgreSQL, SQL Server 2012/2017, Oracle 8x/9i/10g, T-SQL and PL/SQL (Stored Procedures, Functions, Triggers).
  • Good exposure with database performance evaluation and tuning.
  • Experienced in developing applications on various tools like Microsoft BIDS and Visual Studio
  • Good exposure with Windows, Linux, and AWS cloud related environments
  • Excellent communication and interpersonal skills.
  • Able to manage changing responsibilities and deliver time critical projects on schedule
  • Hands on experience in developing ETL related tasks in BigData environment leveraging Pig, Python, and Hive
  • Involved in development of enterprise solution in Healthcare, Banking and Education domains.


Programming: Java, Python, JavaScript

Web Framework: AngularJS 2, Bootstrap (HTML5, CSS)

Mainframe Languages: COBOL, JCL

Database: (RDBMS) SQL Server, DB2, PostgreSQL, MySQL


Data Modeling: Erwin

Database (NoSQL): MongoDB, HBase, Cassandra, Neo4J

IDE: Eclipse, Code Visual Studio

Application Server: Tomcat, Django

Build Tools: Ant, Maven, Gradle, Github

Micro Service & RESTful: Spring, Swagger (API design)

Agile Methodology: Rational Unified Process (RUP), JIRA

Testing framework: JUnit

Operating Systems: Windows, Linux

Cloud: AWS

Big Data: Eco System Apache Hadoop, Cloudera Hadoop HDFS, YARN, MapReduce, Pig, Hive, Kafka, Sqoop, Flume, Spark


Confidential, Minneapolis, MN

Hadoop/ Big data Engineer


  • Crawling and extracting relevant provider information from the web portals to MySQL staging database.
  • Designing and developing Pig, Python, Hive and Java MapReduce programs to process massive amount of provider information.
  • Collaborating with SOLR team to index design
  • Developing PoC repositories in HBase, MongoDB, and Cassandra
  • Developing automation jobs in Linux bash shell.
  • Created ETL Mapping with Talend Integration Suite to pull data from Source, apply transformations, and load data into target database.
  • Involved in converting Hive/SQL queries into Spark transformations using Spark RDD'S and Scala.
  • Load the data into Spark RDD and performed in-memory data computation to generate the output response.
  • Used Spark API over Cloudera Hadoop YARN to perform analytics on data in HDFS.
  • Developing purge and archive routines for HDFS file cleanup.
  • Create Stored procedures, views and triggers.
  • Design, develop, implement, and assist in validating ETL processes
  • Design and develop SSIS ETL workflow.
  • Built an EC2 instance(RHEL).
  • Migrating data to cloud.

Environment: Java, Python, Hadoop Ecosystem (Pig, Hive, HDFS, MapReduce, YARN, Cloudera Manager, Sqoop, Flume, Zoo-Keeper, Kafka, Impala), Amazon Web Services(AWS),NoSQL Databases (MongoDB, Cassandra, HBase), MySQL, Cloudera, JIRA, Linux Shell Scripting, SSIS, Talend Studio Data Integration.


Student Developer


  • Providing support for the newly developed website by making necessary changes as per design using HTML, CSS, JavaScript.
  • Developing a database repository.
  • Developing Python programs for data extraction and cleansing related jobs.
  • Help Prospective International Students with transition to UHCL and answering their queries (through e-mail and phone calls) before their arrival to United States.
  • Organized New International Student Orientation for Spring-16, Summer-16 and Fall-16.
  • Processing applications to International Student Advisors.

Environment: Bootstrap (HTML, CSS), JavaScript, Python, PostgreSQL, AngularJS.


Application Developer


  • Requirement analysis
  • Creating Conceptual Schema and ERD diagrams
  • Creating and maintaining the Physical Schema on MSSQL
  • Materialize design to code for assigned tasks/projects.
  • Ensure coding standards and guidelines are adhered.
  • Collaborating with Business Analysts and Quality Engineers for defect mitigation and bug fixing
  • Production maintenance and support.
  • Exploration Warehouse development for retail banking.
  • Archival status reports using SSRS
  • SSIS packages to load from mainframe datasets, external flat files
  • Code optimization and query tuning
  • Provide Automated Customer Account Transfer Service.
  • Task automation design, development and deployment
  • Developing data cleansing routines and parsers in python
  • API documentation

Environment: SSIS, SSRS, MSSQL, Erwin, Python, COBOL, JCL, VSAM, DB2, CICS (IBM Mainframe), Django.


Database Developer


  • Requirement analysis for Conceptual, Logical, and Physical Data Models
  • Design and development of ETL and Metadata/MDM components using SSIS
  • Design and develop data stream ETL for external data sources
  • Perform data cleansing and quality using Fuzzy lookups
  • Design and develop control flows
  • Automate the data consolidation process
  • Prepare Master Data Management repositories like - Customers, Products and Services
  • Work with QA team on fixing data quality or process bugs
  • Perform ETL or Query tuning for better performance

Environment: SSIS, Python, MongoDB, T-SQL, SQL 2012, ERWIN

Hire Now