We provide IT Staff Augmentation Services!

Sr. Big Data Engineer Resume

Harrisburg, PA


  • Overall, 11 years of experience in IT industry and played various roles (Big Data Engineer, ETL Systems Engineer and Database Administrator).
  • Senior Big Data Engineer and Hadoop Developer.
  • Excellent understanding / knowledge of Hadoop architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node and MapReduce Programming Paradigm.
  • Transforming and retrieving teh data by usingSpark,Impala,Pig,Hive,SSISandMap Reduce.
  • Data Streaming from various sources like cloud (AWS, Azure) and on - premises by using teh toolsSparkandFlume.
  • Experience in developing customizedUDF’sin Python to extend Hive and Pig Latin functionality.
  • Data importing and exporting by usingSqoopfrom HDFS to Relational Database Systems and vice-versa.
  • Extensively using open-source languagesPerl,Python,ScalaandJava.
  • Excellent knowledge and Extensively using NOSQL databases (HBase).
  • Experience in Hadoop streaming and writing MR jobs by using Perl, Python other than JAVA.
  • Excellent knowledge and Extensively using WebHDFS REST API commands.
  • Experience in automation and building CICD pipelines by usingJenkinsandChef.
  • Develop generic SQL Procedures and Complex T-SQL statements to achieve teh reports generation.
  • Hands on experience on data modelling withStar schemaandSnowflake schema.
  • Business Intelligence Systems Engineer and Database Administration.
  • MCP Database Administrator for SQL Server 2005, 2008, 2008 R2, 2012 and 2014.
  • Excellent knowledge on Business Intelligence toolsSSIS, SSAS,SSRS, Informatica and PowerBI.
  • Design and Implement teh Data Distribution Mechanisms on SQL Server (Transactional, Snapshot, Merge Replications, SSIS and DTS).
  • High Availability and Disaster Recovery Systems Design and Implementation on SQL Server (Always On, Mirroring and Log Shipping).
  • Hands on experience withSQL Server Failover Cluster with Active/Passivemodel.
  • Database Backup, Database Restore, Data Recovery and Data Protectionon SQL Server.
  • SQL Server Capacity planning, Space Management, Data Partition and Data Management.
  • Excellent knowledge on Database/Data Warehousing concepts such asNormalization, Entity-Relationship Modelling, Dimensional Data Modelling, Schema and Metadata.
  • Monitoring Data Activities (Database Status, Logs, Space Utilization, Extents, Checkpoints, Locks and Long Transactions) and apply improvements.
  • Excellent knowledge onConfidential Azure Services,Amazon Web Servicesand Management.
  • Side by side upgrade, In-Place upgrade and Data Migration.
  • Incident Management, SLA Management, TSG Maintenance and FTM Improvement.
  • Effectively Plan and Manage project deliverable with on-site and offshore model and improve teh client satisfaction.
  • Responsible for team goal settings, timely feedback and improve their performance.


Big Data: Cloudera, Impala, Pig, Hive, Sqoop, Spark, Oozie, Flume, Hue, HDInsight, Zeppelin, Qubole

BI Tools: SSIS, SSAS, SSRS, Informatica PowerCenter, PowerBI, Zoom-Data

Languages: SQL, Perl, Scala, Java, Power Shell, Python, C#, VB.Net

Server Technologies: ASP.Net, ASP, Hibernate, JSP, JavaScript, XML, JSON

SQL Tools: BCP, TableDiff, DTA, SSIS, SQL Profiler

Operating Systems: RedHat 7, Ubuntu 12.x, Windows 7/8/2008/2012/2003/2012 , CentOS 6.0

Tools: Eclipse, IntelliJ J, SQL Developer, Toad, VSTF, GIT, JIRA

DevOps Tools: Jenkins, Chef

Databases: HDFS, HBASE, SQL Server, SQL Azure, Oracle


Confidential, Harrisburg, PA

Sr. Big Data Engineer


  • Create data pipeline of gathering, cleaning and optimizing data using Hive, Spark.
  • Gathering teh data stored in AWS S3 from various third-party vendors, optimizing it and joining with internal datasets to gather meaningful information.
  • Combining various datasets in HIVE to generate Business reports.
  • Using partitioning and bucketing in HIVE to optimize queries.
  • Storing data in ORC, Parquet and Avro File format with compression.
  • Moving data between cloud and on-premise Hadoop using DISTCP and proprietary ingest framework.
  • Scheduling Workflows, Coordinators and Bundle using Oozie.
  • Using Spark Dataframe API in Scala for analysing data.
  • Using Hadoop on Cloud service (Qu bole) to process data in AWS S3 buckets.
  • Programming using Java and Scala.
  • Use Jenkins and Maven as build tools.
  • A continuous integration and deployment pipeline by using Jenkins and Chef.
  • Working using agile methodology.

Environment: Hadoop, Horton Works, Spark, Zeppelin, Qu bole, Oozie, Hive, Sqoop, AWS, Jenkins, Chef, Linux Red-Hat and Teradata.

Confidential, Philadelphia, PA

Big Data Engineer


  • Design and Develop Data Collectors and Parsers by using Perl or Python.
  • Experience in developing customized UDF’ Confidential in Python to extend Hive and Pig Latin functionality.
  • Design and Develop Parsers for different file formats (CSV, XML, Binary, ASCII, Text, etc.).
  • Data Import and Export from various sources through Script and Sqoop.
  • Extensive usage of Spark for data streaming and data transformation for real time analytics.
  • Extensively using Web HDFS REST API commands in Perl scripting.
  • Big Data management in Hive and Impala (Table, Partitioning, ETL, etc.).
  • Extensive Usage of Hue and other Cloudera tools.
  • Extensive usage of NOSQL (HBASE) Database.
  • Design and Create Dimension, Fact Tables as per teh KPIs.
  • Design and Develop Dashboards with KPIs as per teh Metrics.
  • Design and Develop Dashboards in Zoom-Data and Write Complex Queries and Data Aggregation.
  • Extensive usage of Cloudera Hadoop distribution.
  • Shell Programming and Crontab automation.

Environment: Hadoop, Cloudera, Spark, Zeppelin, Hue, Impala, Pig, Hive, Sqoop, Zoom-Data, Linux Red-Hat and Oracle.

Confidential, Harrisburg, PA

BI Engineer


  • Transform teh raw data to meaningful information to support business decisions.
  • Develop, Deploy and Troubleshoot teh ETL Work Flows using Hive, Pig and Sqoop.
  • Develop Map Reduce jobs for Data Clean-up in Python and C#.
  • Dimensions, Facts Design, and Data Modelling for reporting purpose.
  • Design and implement teh DataMart and improve teh performance.
  • Identify teh Reports for decision-making and create teh Dash Boards by using teh PowerBI.
  • Data analysis and Management reporting, Generate Metrics, Data Discovery, Dashboards and Scorecards.
  • Writing generic SQL Procedures and Complex T-SQL statements to achieve teh reports generation.
  • Creating complex graphical representation to identify teh service health for decisions making.
  • Maintain teh HDInsight Clusters, troubleshoot teh issues, and coordinate with partners.
  • Prepare teh Metrics and bandwidth utilization of teh Team, identify teh root cause of teh spike, and apply teh improvements.
  • Identify teh process gaps and define teh process to optimize teh support.
  • Extensive usage of Azure Portal, Azure PowerShell, Storage Accounts, Certificates and Azure Data Management.
  • Excellent knowledge on Confidential Cloud technology, Components and Subscription Management.
  • Prepare and Present teh Metrics for teh Team utilization and Environment status in PowerBI, Power Point and SQL Azure.
  • Responsible for team goal settings, timely feedback and improve their performance.

Environment: Hadoop, Horton Works, HDInsight 3.1, YARN, Oozie, Pig, Hive, Sqoop, PowerBI, Azure Storage and SQL Server 2014.

Confidential, PA

ETL System Engineer SQL DBA


  • Deploy and Troubleshoot ETL jobs that use SSIS packages.
  • Hands on experience for Data Integrity process and Data Modelling concepts.
  • Manage and troubleshoot teh multi-dimensional data cubes developed in SSAS.
  • Manage large data movements with partitions and data management.
  • Backup teh databases, restore teh databases, and troubleshoot teh issues.
  • Setup teh Always On including Windows cluster and troubleshoot teh issues.
  • Setup teh Snapshot, Transactional replication on Always On environment.
  • Troubleshoot teh SQL server performance issues and optimize teh TSQL statements.
  • Extensively using teh Business Intelligence tools (SSIS, SSAS and SSRS).
  • Manage teh Security through Logins, Users, Permissions, Certificates, Credentials and Encryption Schemes as per teh requirements.
  • SQL Server space, storage and database management for OLAP systems.
  • Extensive usage of SQL Server utilities BCP, Table Diff, DTA, DTEXEC, Profiler and SQLCMD.
  • Migrate databases to cloud platform SQL Azure and as well teh performance tuning.
  • Build and maintain teh environment on Azure IAAS, PAAS.
  • Lead teh team and Manage project deliverables with on-site and offshore model.
  • Responsible for team goal settings, timely feedback and improve their performance.

Environment: SQL Server 2005, 2008R2 & 2012, IIS, Windows 2012, SSIS, SSAS, SSRS, Entity Framework.

Confidential, Chambersburg, PA

Senior Support Analyst


  • Resolving issues related to Enterprise data warehouse (EDW), stored procedures in OLTP system and analysed, design and develop ETL strategies.
  • Identified performance issues in existing sources, targets and mappings by analysing teh data flow, evaluating transformations and tuned accordingly for better performance.
  • Worked with heterogeneous source to Extracted data from Oracle database, XML and flat files and loaded to a relational Oracle warehouse.
  • Troubleshoot standard and reusable mappings and mapplets using various transformations like Expression, Aggregator, Joiner, Router, Lookup (Connected and Unconnected) and Filter.
  • Performed tuning of SQL queries and Stored Procedure for speedy extraction of data to resolve and troubleshoot issues in OLTP environment.
  • Troubleshooting of long running sessions and fixing teh issues related to it.
  • Worked with Variables and Parameters in teh mappings to pass teh values between sessions.
  • Involved in teh development of PL/SQL stored procedures, functions and packages to process business data in OLTP system.
  • Worked with Services and Portal teams on various occasion for data issues in OLTP system.
  • Worked with teh testing team to resolve bugs related to day one ETL mappings before production.
  • Creating teh weekly project status reports, tracking teh progress of tasks according to schedule and reporting any risks and contingency plan to management and business users.

Environment: Informatica PowerCenter 8.6.1, Oracle 11g/10g/9i/8i, PL/SQL, SQL Developer 3.0.1, Toad 11.

Hire Now