We provide IT Staff Augmentation Services!

Datawarehouse Engineer Resume

2.00/5 (Submit Your Rating)

Chicago, IL


  • Excellent understanding/noledge of Hadoop architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node and MapReduce programming paradigm.
  • 8 Years of experience in IT industry comprising of extensive work experience includes 4 years of experience in Big Data technologies.
  • Hands on experience on major components in HadoopEcosystem like HadoopMapReduce, HDFS, HIVE, PIG, HBase, Sqoop, Oozie and Flume.
  • Excellent Knowledge of HIPAA standards, EDI (Electronic data interchange), transaction syntax like ANSI X12, Implementation and Knowledge of HIPAA code sets, ICD - 9, ICD-10 coding and HL7.
  • Experience with CDH distribution and Cloudera Manager to manage and monitorHadoop clusters.
  • Expertise in working with different kind of data files such XML, JSON, Parquet, Avro and Databases.
  • Experience in shell andPythonscripting languages.
  • Extensive experience in developing Pig Latin Scripts for transformations and using Hive Query Language for data analytics.
  • Good experience working on bothHadoopdistributions: Cloudera and Hortonworks.
  • Involved in Spark-Cassandra data modeling.
  • Worked on Apache FLUME distributed service.
  • Experience in Importing and exporting data from different databases like MySQL, Oracle, Teradata into HDFS and vice-versa using Sqoop.
  • Hands on experience with working on Spark using both Scala and Python.
  • Performed various actions and transformations on spark RDD's and Data Frames.
  • Experience with Oozie Workflow Engine in running workflow jobs with actions dat run Hadoop Map/Reduce and Pig jobs.
  • Implemented Oozie workflows using Sqoop, pig, hive, shell actions and Oozie coordinator to automate tasks.
  • Implemented Spark Scripts using Scala, SparkSQLto access hive tables into spark for faster processing of data.
  • Have hands on experience in writing MapReduce jobs in Java, Pig and Python and have written MapReduce programs for teh analysis of data and to discover trends of data usage by teh users.
  • Experienced with batch processing of data sources using Apache Spark.
  • Hands on Experience in WritingPythonScripts for Data Extract and Data Transfer from various data sources.
  • Experienced in performance tuning of Spark Applications for setting right Batch Interval time, correct level of Parallelism and memory tuning.
  • Experienced in implementing Spark RDD transformations actions to implement business analysis.
  • Used Flume to collect, aggregate and store teh web log data onto HDFS.
  • Used Zookeeper for various types of centralized configurations.
  • Extensive noledge and experience on real time data streaming technologies like Kafka, Storm and Spark Streaming.
  • Great Understanding of AWS compute services such as EC2, Elastic Map Reduce(EMR), EBS and accessing Instance metadata.
  • Designed and implemented complex SSIS package to migrate data from multiple data sources.
  • Developed queries for drill down, cross tab, sub reports and ad-hoc reports using SSRS.
  • Processed data from cubes and SSIS to generate reports by report server in SSRS.
  • Strong ability to understand new concepts and applications.
  • Excellent Verbal and Written Communication Skills have proven to be highly effective in interfacing across business and technical groups.


Hadoop Ecosystem Development: HDFS, MapReduce, Hive, Impala, Pig, Oozie, HBase, Sqoop, Flume, Yarn, Scala, Spark, Kafka, Flume, and ZooKeeper.

Hadoop Distribution System: Cloudera, Hortonworks.

Languages: PL, SQL, Transact-SQL, SQL, C/C++, JAVA, Scala, Python.

Scripting: Pig Latin.

Database: Teradata, MSQL, MS-SQL, Hive.

NoSQL Database: Apache HBase, Mongo Db, Cassandra.

ETL Tools: Apache Pig, Talend and Tableau.

Web Design Tools: HTML, DHTML, REST, AJAX, JavaScript, JQuery and CSS, AngularJS, ExtJS and JSON.

Frame works: MVC, Struts, Hibernate and Spring.

Operating Systems: Linux (Centos, Ubuntu), Unix, Windows 7/Vista/XP/2000/NT, Server 2012/2008R2, Mac.


Confidential, Chicago, IL

Datawarehouse Engineer


  • Analyzed all teh tables in DB and listed out teh classified columns. Created hashing algorithms in python to hash those columns.
  • Created User views on Teradata to enforce data abstraction on hive tables.
  • Designed and implemented importing data to HDFS, HIVE and HBASE using Sqoopfromdifferent RDBMS servers.
  • Used Flume to collect, aggregate, and store teh web log data from different sources like web servers and pushed to HDFS.
  • Used Teradata Parallel Transfer (TPT) to load data from hive/HDFS to Teradata and vice-versa.
  • Worked on loading of data from several flat files sources to Staging using MLOAD, FLOAD.
  • Successfully loaded files to Hive and HDFS from SQL Server using SQOOP.
  • Performed Data Cleansing using Python and loaded into teh target tables.
  • Managing and scheduling Jobs through OOZIE Scheduler.
  • Worked on ETL process and handled importing data from various data sources, performed transformations
  • Changed teh existing ETL pipeline to become GDPR compliant.
  • Optimized Teradata query and ETL jobs to reduce teh pipeline time by 30%.
  • Developed YAML scripts to automate data pipelines.
  • Worked with NoSQL databases like HBase for creating HBase tables to load large sets of semi structured data coming from various sources.
  • Efficiently put and fetched data to/from HBase by writing Map/Reduce job.
  • Developed counters on HBase data to count total records on different tables.
  • Developed python scripts for automatic purging of data on Hadoop clusters.
  • Collected teh JSON data from HTTP Source and developed Spark APIs dat halps to do inserts and updates in Hive tables.
  • Developed a script in Scala to read all teh Parquet data in a HDFS and parse them as Orc files, another script to parse them as structured tables in Hive.
  • Exploring with teh Spark for improving teh performance and optimization of teh existing algorithms inHadoopUsing Spark Context, Spark-SQL, Data Frame, Pair RDD’s Using YARN.
  • Involved in performance tuning of Spark jobs using Cache and complete advantage of cluster environment.
  • Designed a Rights Tracks as part of GDPR and developed a data pipeline using Teradata, Hive and Spark for Rights Track.

Environment: Hadoop, HDP, MapReduce, Hive QL, MySQL, Teradata, TPT, SQL, HBase, HDFS, HIVE, Impala, PIG, Sqoop, Oozie, Apache Spark, Python, Scala, Zookeeper, Hue, Opswise, Oozie, Yarn, YAML, UNIX.

Confidential, Rochester, MN

Data Engineer


  • Created and enforced policies to achieve HIPAA compliance.
  • Monitor System health and logs and respond accordingly to any warning or failure conditions.
  • Involved in maintaining various Unix Shell scripts.
  • Created S3 buckets in teh AWS environment to store files.
  • Configured S3 buckets with various life cycle policies to archive teh infrequently accessed data to storage classes based on requirement.
  • Migrated 160 tables from Oracle to HDFS and HDFS to Cassandra.
  • Involved in file movements between HDFS and AWS S3.
  • ImplementedPythoncode for retrieving teh Social Media data.
  • Load and transform large sets of structured, semi structured and unstructured data.
  • Developed a data pipeline using Kafka and Storm to store data into HDFS.
  • Involved in importing teh real-time data to Hadoop using Kafka and implemented teh Oozie job for daily imports.
  • Automated all teh jobs starting from pulling teh Data from different Data Sources like MySQL to pushing teh result set Data toHadoopDistributed File System using Sqoop.
  • Import teh data from different sources like HDFS/HBase into Spark RDD.
  • Involved in runningHadoopstreaming jobs to process terabytes of text data.
  • Worked on creating custom ETL scripts usingPythonfor business related data.
  • Converted alljobs to run in EMR by configuring teh cluster according to teh data size.
  • Performed teh ETL operations using Elastic Map Reduce andRedshift
  • Used Spark Streaming to divide streaming data into batches as an input to Spark engine for batch processing.
  • Developed counters on HBase data to count total records on different tables.
  • ETL Data Cleansing, Integration & Transformation using Pig scripts for managing data from disparate sources.
  • Used HBase to store majority of data which needs to be divided based on region.
  • Developed Spark code using Scala and Spark-SQL/Streaming for faster testing and processing of data.
  • created Spark jobs for processing data from S3 data lake toRedshift
  • Used Coalesce and repartition on data frames while optimizing teh Spark jobs.
  • Used Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive.
  • Performed performance tuning for Spark Steaming e.g. setting right Batch Interval time, correct level of Parallelism, selection of correct Serialization & memory tuning.
  • Analyze business requirements and data sources from Excel, Oracle, and SQL Server for design, development, testing, and production rollover of reporting and analysis projects within Tableau.
  • Implemented Fair schedulers on teh Job tracker to share teh resources of teh Cluster for teh MapReduce jobs given by teh users.
  • Created EBS volumes for storing application files for use with EC2 instances whenever they are mounted to them.
  • Used Zookeeper for various types of centralized configurations.
  • Worked on Creating Kafka topics, partitions, writing custom partitioned classes.
  • Worked on Creating Kafka Adaptors for decoupling teh application dependency.
  • Exported teh analyzed data to teh relational databases using Sqoop for visualization and to generate reports.
  • Integrated BI tool with Impala for visualization.
  • Worked in Agile development environment in sprint cycles of two weeks by dividing and organizing tasks.

Environment:Hadoop, CDH 5, MapReduce, Hive QL, MySQL, HBase, AWS, HDFS, HIVE, Impala, PIG, Sqoop, Oozie, Flume, Apache Spark, Python, Scala, Cloudera, Zookeeper, Hue Editor, Oracle 11g, PL/SQL, UNIX, Tableau.

Confidential, Gloden Valley, MN

Hadoop Developer


  • Created SSIS Packages to import teh data from Oracle databases, XML, text files, Excel files.
  • Developed queries for drill down, cross tab, sub reports and ad-hoc reports using SSRS.
  • Wrote optimizedSQLqueries inSQLQuery Analyzer for efficient handling at huge loads of data
  • Created views to restrict access to data in a table for security purposes.
  • Created SSIS package to extract, validate and load data into Data warehouse.
  • Develop backup & restore scripts forSQLServer as needed.
  • Design and implementation of database maintenance plan.
  • Job, Scheduling, batch, alert and E-mail notification setting.
  • Involved in start to end process of Hadoop cluster installation, configuration and monitoring.
  • Data migration from existing data stores toHadoop.
  • Handled importing of data from various data sources, performed transformations using Hive MapReduce, loaded data into Hadoop Distributed File System (HDFS) and extracted teh data from MySQL into HDFS vice-versa using Sqoop.
  • Involved in loading data from UNIX file system to HDFS.
  • Involved in managing and reviewingHadooplog files.
  • Involved in creating Hive tables, loading with data and writing hive queries views and worked on them using Hive QL.
  • Designed a data warehouse using Hive.
  • Developed teh Pig UDF'S to pre-process teh data for analysis.
  • Supported Data Analysts in running MapReduce Programs.
  • Worked on Performance tuning on MapReduce jobs.
  • Worked on Cloudera distribution system for runningHadoopjobs on it.
  • Worked on analyzing Data with HIVE and PIG.
  • Analyzed teh data by performing Hive queries and running Pig scripts to study customer behavior.
  • Used Tableau to create reports representing analysis in graphical format.
  • Created customized reports and processes in Tableau Desktop.
  • Worked on data analysis and send teh reports to clients on daily basis.
  • Worked with team and collaborated to meet project timelines.

Environment:CDH 3, PL/SQL, MySQL,SQLServer 2008(SSRS & SSIS), Hadoop, MapReduce, HDFS, Pig, Hive, Sqoop, Java, UNIX, Tableau.


SQL Developer


  • Create database objects including tables, triggers, views, stored procedures, indexes, defaults and rule.
  • Tuning and optimizing queries and indexes.
  • MonitorSQLServer log files.
  • Converted and loaded data from different databases and files.
  • Perform optimization ofSQLqueries inSQLServer and Sybase system.
  • Create jobs and monitor job history for maximum availability of data and to ensure consistency of teh databases.
  • Create and maintain databases after logical and physical database design.
  • Maintain Disaster recovery strategies for teh database and fail-over methods.
  • Perform optimization ofSQLqueries inSQLServer and Sybase system.
  • Monitored teh performance of Database Server.
  • Maintenance of clustered and non-clustered indexes.
  • Monitor server space usage and generate reports.
  • Worked with SharePoint Server 2007 while deploying SSRS reports.
  • Create and maintain database for Incident, Problem Tracking, and Metrics.
  • Created packages in SSIS with error handling and mapping using different tasks in teh designer.
  • Designed and implemented complex SSIS package to migrate data from multiple data sources.
  • Used teh transformations such as Merge, Data Conversion, Conditional Split and Multicast to distribute and manipulate data to teh destination in SSIS.
  • Processed data from cubes and SSIS to generate reports by report server in SSRS.

Environment: PL/SQL, MySQL,SQLServer 2008(SSRS & SSIS), Visual studio, MS Excel.


Programmer Analyst/SQLDeveloper


  • DevelopedSQLScripts to perform different joins, sub queries, nested querying, Insert/Update and Created and modified existing stored procedures, triggers, views, indexes.
  • Responsible in maintaining databases.
  • Performed intermediate queries usingSQL, including Inner/Outer/Left Joins and Union/Intersect.
  • Responsible in implementing and monitoring database systems.
  • Designed and modified physical databases with development teams.
  • Worked with Business Analysts and Users to understand teh requirement.
  • Responsible for teh designing teh advanceSQLqueries, procedure, cursor, triggers.
  • Build data connection to teh database using MSSQLServer.
  • Worked on project to extract datafromxml file toSQLtable and generate data file reporting usingSQLServer 2008.
  • Created Drill-through, Drill-down, Cross Tab Reports, Cached reports and Snapshot Report to give all teh details of various transactions like closed transactions, pending approvals and summary of transactions and scheduled dis report to run on monthly basis.
  • Created reports and designed graphical representation of analyzed data using reporting tools

Environment: MSSQLServer 2008/2005,SQLServer Integration Services 2008,SQLServer Analysis Services 2008, MS Visual Windows 2003/2000.

We'd love your feedback!