We provide IT Staff Augmentation Services!

Senior Data Engineer Resume

Rochester, MN


  • Over 11 years of experience in Information Technology with strong working knowledge in Data Analytics, Data Engineering, Data Warehousing, and Analytical Reporting
  • Over 4 years of design, development, and support experience in Big Data, Hadoop Ecosystem technologieswhich includesCore Spark, Spark SQL,Scala, Functional Programming,Spark StreamingHadoop, Map Reduce, Pig, Hive, HBase, Sqoop, Flume, Hue, Oozie and Zookeeper.
  • Over 1 year of experience in Predictive Modelling using Machine Learning, Python, Spark MLlib,.
  • Excellent knowledge on Hadoop ecosystems such as HDFS, Resource Manager, Name Node, Data Node and Map Reduce programming paradigm
  • Experience in designing and developing ETL jobs in Spark SQL using Scala &Python to perform in - memory data processing.
  • Hands on experience in writing Ad-hoc Queries for moving data from HDFS to HIVE and analyzing the data using Hive QL& Spark SQL.
  • Hands on experience with SBT tool suit for development of Scala Applications.
  • Hands on experience with Scala for the batch processing and Spark streaming data.
  • Proficient in Scala Programming Language, Data extraction, Data cleaning, Data Loading, Data Transformation and Data visualization using Scala and Tableau
  • Knowledge of various Hadoop file formats such as Avro, Parquet, JSON etc and file compressions such as Snappy, Gzip.
  • Experience in manipulating/analyzing large datasets and finding patterns and insights within structured and unstructured data.
  • Experience in Oozie and workflow scheduler to manage Hadoop jobs by Direct Acyclic Graph (DAG) of actions with control flows.
  • Involvement with Oozie Workflow Engine in running work process employments with activities that run Hadoop Map/Reduce and Pig jobs.
  • Worked on importing data into HBase using HBase Shell and HBase Client API.
  • Experience in working on various databases and database script development using SQL and PL/SQL.
  • Hands on experience in application development using Java, RDBMS and Linux Shell Scripting.
  • Experience in working with Amazon EC2, Cloudera (CDH5) and Hortonworks (HDP 2.4) Hadoop Distributions.
  • Knowledge of data warehousing and ETL tools such as Informatica, ODI and BI reporting tool such as OBIEE, SSRS, Tableau
  • Actively participated in organizing and training for User Acceptance Testing UAT for end-to-end product software release.
  • Highly motivated, self-starter with a positive attitude, willingness to learn new concepts and acceptance of challenges.
  • Major strengths are familiarity with multiple software systems, ability to learn quickly new technologies, adapt to new environments, self-motivated, focused adaptive and quick learner with excellent interpersonal, technical and communication skills.
  • Good experience in working with industries including Finance, Airline, Manufacturing, Telecommunication, Healthcare and Media


Operating Systems: Unix, Linux, Windows, VMWare

Hadoop Distribution: Apache, Hortonworks, Cloudera

Big Data Technologies: Apache Spark, Spark SQL, Spark Streaming, GraphX,Hadoop (MRv2), HDFS, MapReduce, Hive, Pig, Sqoop, HBase, Oozie, Flume, Hue

Programming Languages: SQL, Scala, Python,Core Java, Shell script, Pig Latin, Sqoop, HiveQL, Flume, Spark SQL

Databases: Oracle, MS-SQL Server, Teradata, MySQL, HBase (NoSQL)

BI Technologies: Tableau, OBIEE 11g, SSRS, BI Applications 7.9.x, BI Publisher, MS Excel

ETL Tools: Informatica, SSIS, Oracle Data Integrator (ODI)


Confidential - Rochester, MN

Senior Data Engineer


  • Created Data Lake by extracting patient's data using Sqoop from various data sources into HDFS. This includes data from DB2, Mainframes, RDBMS, CSV and Excel.
  • Developed an ingestion module to ingest data into HDFS from heterogeneous data sources.
  • Used Spark Scala APIs over Hadoop YARN to perform analytics on data in Hive.
  • Worked with Data Science team on Predictive Modeling to buildreadmissions risk model to identify readmission using random forest algorithm.
  • Analyzed the SQL scripts and designed the solution to implement using spark
  • Built distributed in-memory applications using Spark and Spark SQL to do analytics efficiently on huge data sets.
  • Efficiently used spark transformations and actions to build simple/quick and complex ETL applications.
  • Developed Python code to provide data analysis and generate complex data report.
  • Uploaded data to Hadoop Hive and combined new tables with existing databases.
  • Involved in preparing design and unit and Integration tests documents.

Environment: HDP 2.3.4, Scala,Hadoop 2.7, Apache Spark, Spark SQL, Hive,Sqoop, Tez,Python,MapReduce,Java 7, Hue, Ambari, Avro, Parquet

Confidential, Lake Forest, CA

Big Data Engineer


  • Implemented various POCs in Spark using Scala& Python to compare the performance of Spark with Hive and Oracle SQL.
  • Worked on converting Hive queries to Spark SQL Data Frames to compare the performance.
  • Worked on data import/ export into HDFS and Hive using Sqoop.
  • Created & scheduled Oozie workflows to execute MapReduce, Sqoop, Hive and Pigjobs
  • Wrote Pig Scripts to generate MapReduce jobs and performed ETL procedures on the data in HDFS.
  • Processed HDFS data and created external tables using Hive, in order to analyze visitors per day, page views and most purchased products.
  • Exported analyzed data to HDFS using Sqoop for generating reports.
  • Used MapReduce and Sqoop to load, aggregate, store and analyze web log data from different web servers.
  • Extracted files from MySQL and Oracle through Sqoop and placed in HDFS and processed.
  • Load and transform large sets of structured, semi structured and unstructured data using Pig & Hive.
  • Developed Hive queries for Business Analysts
  • Responsible to manage data coming from different sources Oracle, MySQL, CSV Files, JSON Files.
  • Supported Map Reduce Programs those are running on the cluster.
  • Involved in loading data from UNIX file system to HDFS using batch scripts.
  • Installed and configured Hive and also written Hive UDFs.
  • Involved in creating Hive tables, loading with data and writing Hive queries which will run internally in map reduce way.
  • Worked on modifying and executing the UNIX shell scripts.
  • Involved in web testing using REST API for different member and provider portals.
  • Involved in building and maintaining test plans, test suites, test cases, defects and test scripts
  • Conducted functional, system, data and regression testing.
  • Involved in Bug Review meetings and participated in weekly meetings with management team.

Environment: CDH 5.4, Java 7, Eclipse, Linux, Hadoop, HBase, Sqoop, Pig, Hive, MapReduce, Informatica, Tableau, OBIEE, MS Excel,Apache Spark, Spark SQLComputer Technology Resources Inc, Irvine, CA, USA


BI Architect Design, Development Administration and Support


  • Involved in architecture design, development and implementation of Hadoop deployment, backup and recovery systems.
  • Worked on the proof-of-concept POC for Apache Hadoop framework initiation
  • Worked on numerous POCs to prove if Big Data is the right fit for a business case.
  • Developed MapReduce jobs for Log Analysis, Recommendation and Analytics.
  • Wrote MapReduce jobs to generate reports for the number of activities created on a particular day, during a dumped from the multiple sources and the output was written back to HDFS
  • Reviewed the HDFS usage and system design for future scalability and fault-tolerance. Installed and configured Hadoop HDFS, MapReduce, Pig, Hive, Sqoop.
  • Wrote Pig Scripts to generate MapReduce jobs and performed ETL procedures on the data in HDFS.
  • Processed HDFS data and created external tables using Hive, in order to analyze visitors per day, page views and most purchased products.
  • Exported analyzed data to HDFS using Sqoop for generating reports.
  • Used MapReduce and Sqoop to load, aggregate, store and analyze web log data from different web servers.
  • Developed Hive queries for the analysts.
  • Worked on Finance, Procurement, HR (Workforce) and Order Management projects for OBIEE and Informatica Administration and Report development
  • Developed FITGAP Analysis document based on OOTB OBIA analytics application code and client’s existing code and their business requirements
  • Installed and configured OBIA financial analytics and supply chain and order management analytics applications and merged OBIEE RPD with existing projects.
  • Applied performance tuning techniques to improve dashboard performance
  • Developed informatica mappings, transformation logic, sessions and workflows in Informatica from source Oracle APPS, flat files toOBIADW (Oracle)
  • Point of contact for all production issues related to HR, Finance and Order Management projects.

Environment: Cloudera (CDH 5.0), MapReduce, Hive, Pig, Sqoop, Oracle, MySql, Informatica, OBIEE, Oozie, ZooKeeper.


BI Lead


  • Closely with business users and business analyst to understand business requirements
  • Developed LM Finance Review subject area in OBIEE and developed several reports
  • Customized AP,AR,GL subject area and dashboard/reports in OBIEE
  • Installed and configured OBIEE on new Linux servers
  • Installed and configured Informatica Power Center 9.0.1 server & clients on Red Hat Linux 5 with OBIA
  • Installed and configured DAC (Data Warehouse Administration console) on Linux server and applied new patch available on Oracle Support.
  • Migrated OBIEE RPD and web catalog from UAT to production environment
  • Customized OBIEE repository, reports and dashboards in 11g
  • Customized look and feel of reports and dashboards and also replaced Oracle banner with Confidential logo as per business requirements
  • Customized and developed SDE and SIL mappings, transformation logic, sessions and workflows in Informatica and load data from Oracle EBS database to EDW
  • Implemented different passive and active transformations like Source Qualifier, expression, Aggregate, Lookup, Filter, Update Strategy and Router Transformations using transformation developer
  • Developed new aggregate fact tables to improve OBIEE dashboard performance
  • Customized DAC tasks, subject area, indices, execution plan and schedule daily incremental load
  • Supported and fixed issues raised in quality center during pre-production and post production

Environment: OBIEE , OBIA Analytics), OBIEE , Weblogic 10.1.3, Weblogic Admin Console, Weblogic Enterprise manager, Informatica 9.1, Oracle EBS R12, Oracle 11.1.2, Windows 7 and Linux 5.5


Senior BI Consultant (OBIA/ETL)


  • Closely work with business users and business analyst to understand business requirements and existing reports and dashboards available in Siebel Analytics
  • Presented the reports in very user friendly & corporate look & feel
  • Customized the log-in page, reports, and dashboards
  • Embedded flash videos in dashboards to have very good User Interface.
  • Developed technical design documents based on functional requirements and GAP analysis document
  • Customized/developed SDE and SIL mappings, transformation logic, sessions and workflowsin
  • Implemented data level security, subject area level security and dashboard level security
  • Optimized the performance by developing the aggregated fact tablesand configured RPD with aggregated tables
  • Migrated ETL mappings/Workflows, DAC repository, RPD, reports, dashboards from dev to test and to production
  • Troubleshot ongoing Production issues by reviewing design changes made to production systems and made corresponding changes to informatica mappings/workflows and/or OBIEE RPD, reports and dashboards

Environment: OBIEE , Siebel Analytics Applications applications, OBIEE Admin Tool, OBIEE Analytics Web (Oracle Answers, Dashboards, Delivers and BI Publisher), DAC, Informatica8.6.1

Hire Now