Sr. Big Data Engineer Resume
Harrisburg, PA
SUMMARY
- Overall, 11 years of experience in IT industry and played various roles (Big Data Engineer, ETL Systems Engineer and Database Administrator).
- Senior Big Data Engineer and Hadoop Developer.
- Excellent understanding / knowledge of Hadoop architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node and MapReduce Programming Paradigm.
- Transforming and retrieving teh data by usingSpark,Impala,Pig,Hive,SSISandMap Reduce.
- Data Streaming from various sources like cloud (AWS, Azure) and on - premises by using teh toolsSparkandFlume.
- Experience in developing customizedUDF’sin Python to extend Hive and Pig Latin functionality.
- Data importing and exporting by usingSqoopfrom HDFS to Relational Database Systems and vice-versa.
- Extensively using open-source languagesPerl,Python,ScalaandJava.
- Excellent knowledge and Extensively using NOSQL databases (HBase).
- Experience in Hadoop streaming and writing MR jobs by using Perl, Python other than JAVA.
- Excellent knowledge and Extensively using WebHDFS REST API commands.
- Experience in automation and building CICD pipelines by usingJenkinsandChef.
- Develop generic SQL Procedures and Complex T-SQL statements to achieve teh reports generation.
- Hands on experience on data modelling withStar schemaandSnowflake schema.
- Business Intelligence Systems Engineer and Database Administration.
- MCP Database Administrator for SQL Server 2005, 2008, 2008 R2, 2012 and 2014.
- Excellent knowledge on Business Intelligence toolsSSIS, SSAS,SSRS, Informatica and PowerBI.
- Design and Implement teh Data Distribution Mechanisms on SQL Server (Transactional, Snapshot, Merge Replications, SSIS and DTS).
- High Availability and Disaster Recovery Systems Design and Implementation on SQL Server (Always On, Mirroring and Log Shipping).
- Hands on experience withSQL Server Failover Cluster with Active/Passivemodel.
- Database Backup, Database Restore, Data Recovery and Data Protectionon SQL Server.
- SQL Server Capacity planning, Space Management, Data Partition and Data Management.
- Excellent knowledge on Database/Data Warehousing concepts such asNormalization, Entity-Relationship Modelling, Dimensional Data Modelling, Schema and Metadata.
- Monitoring Data Activities (Database Status, Logs, Space Utilization, Extents, Checkpoints, Locks and Long Transactions) and apply improvements.
- Excellent knowledge onConfidential Azure Services,Amazon Web Servicesand Management.
- Side by side upgrade, In-Place upgrade and Data Migration.
- Incident Management, SLA Management, TSG Maintenance and FTM Improvement.
- Effectively Plan and Manage project deliverable with on-site and offshore model and improve teh client satisfaction.
- Responsible for team goal settings, timely feedback and improve their performance.
TECHNICAL SKILLS
Big Data: Cloudera, Impala, Pig, Hive, Sqoop, Spark, Oozie, Flume, Hue, HDInsight, Zeppelin, Qubole
BI Tools: SSIS, SSAS, SSRS, Informatica PowerCenter, PowerBI, Zoom-Data
Languages: SQL, Perl, Scala, Java, Power Shell, Python, C#, VB.Net
Server Technologies: ASP.Net, ASP, Hibernate, JSP, JavaScript, XML, JSON
SQL Tools: BCP, TableDiff, DTA, SSIS, SQL Profiler
Operating Systems: RedHat 7, Ubuntu 12.x, Windows 7/8/2008/2012/2003/2012 , CentOS 6.0
Tools: Eclipse, IntelliJ J, SQL Developer, Toad, VSTF, GIT, JIRA
DevOps Tools: Jenkins, Chef
Databases: HDFS, HBASE, SQL Server, SQL Azure, Oracle
PROFESSIONAL EXPERIENCE
Confidential, Harrisburg, PA
Sr. Big Data Engineer
Responsibilities:
- Create data pipeline of gathering, cleaning and optimizing data using Hive, Spark.
- Gathering teh data stored in AWS S3 from various third-party vendors, optimizing it and joining with internal datasets to gather meaningful information.
- Combining various datasets in HIVE to generate Business reports.
- Using partitioning and bucketing in HIVE to optimize queries.
- Storing data in ORC, Parquet and Avro File format with compression.
- Moving data between cloud and on-premise Hadoop using DISTCP and proprietary ingest framework.
- Scheduling Workflows, Coordinators and Bundle using Oozie.
- Using Spark Dataframe API in Scala for analysing data.
- Using Hadoop on Cloud service (Qu bole) to process data in AWS S3 buckets.
- Programming using Java and Scala.
- Use Jenkins and Maven as build tools.
- A continuous integration and deployment pipeline by using Jenkins and Chef.
- Working using agile methodology.
Environment: Hadoop, Horton Works, Spark, Zeppelin, Qu bole, Oozie, Hive, Sqoop, AWS, Jenkins, Chef, Linux Red-Hat and Teradata.
Confidential, Philadelphia, PA
Big Data Engineer
Responsibilities:
- Design and Develop Data Collectors and Parsers by using Perl or Python.
- Experience in developing customized UDF’ Confidential in Python to extend Hive and Pig Latin functionality.
- Design and Develop Parsers for different file formats (CSV, XML, Binary, ASCII, Text, etc.).
- Data Import and Export from various sources through Script and Sqoop.
- Extensive usage of Spark for data streaming and data transformation for real time analytics.
- Extensively using Web HDFS REST API commands in Perl scripting.
- Big Data management in Hive and Impala (Table, Partitioning, ETL, etc.).
- Extensive Usage of Hue and other Cloudera tools.
- Extensive usage of NOSQL (HBASE) Database.
- Design and Create Dimension, Fact Tables as per teh KPIs.
- Design and Develop Dashboards with KPIs as per teh Metrics.
- Design and Develop Dashboards in Zoom-Data and Write Complex Queries and Data Aggregation.
- Extensive usage of Cloudera Hadoop distribution.
- Shell Programming and Crontab automation.
Environment: Hadoop, Cloudera, Spark, Zeppelin, Hue, Impala, Pig, Hive, Sqoop, Zoom-Data, Linux Red-Hat and Oracle.
Confidential, Harrisburg, PA
BI Engineer
Responsibilities:
- Transform teh raw data to meaningful information to support business decisions.
- Develop, Deploy and Troubleshoot teh ETL Work Flows using Hive, Pig and Sqoop.
- Develop Map Reduce jobs for Data Clean-up in Python and C#.
- Dimensions, Facts Design, and Data Modelling for reporting purpose.
- Design and implement teh DataMart and improve teh performance.
- Identify teh Reports for decision-making and create teh Dash Boards by using teh PowerBI.
- Data analysis and Management reporting, Generate Metrics, Data Discovery, Dashboards and Scorecards.
- Writing generic SQL Procedures and Complex T-SQL statements to achieve teh reports generation.
- Creating complex graphical representation to identify teh service health for decisions making.
- Maintain teh HDInsight Clusters, troubleshoot teh issues, and coordinate with partners.
- Prepare teh Metrics and bandwidth utilization of teh Team, identify teh root cause of teh spike, and apply teh improvements.
- Identify teh process gaps and define teh process to optimize teh support.
- Extensive usage of Azure Portal, Azure PowerShell, Storage Accounts, Certificates and Azure Data Management.
- Excellent knowledge on Confidential Cloud technology, Components and Subscription Management.
- Prepare and Present teh Metrics for teh Team utilization and Environment status in PowerBI, Power Point and SQL Azure.
- Responsible for team goal settings, timely feedback and improve their performance.
Environment: Hadoop, Horton Works, HDInsight 3.1, YARN, Oozie, Pig, Hive, Sqoop, PowerBI, Azure Storage and SQL Server 2014.
Confidential, PA
ETL System Engineer SQL DBA
Responsibilities:
- Deploy and Troubleshoot ETL jobs that use SSIS packages.
- Hands on experience for Data Integrity process and Data Modelling concepts.
- Manage and troubleshoot teh multi-dimensional data cubes developed in SSAS.
- Manage large data movements with partitions and data management.
- Backup teh databases, restore teh databases, and troubleshoot teh issues.
- Setup teh Always On including Windows cluster and troubleshoot teh issues.
- Setup teh Snapshot, Transactional replication on Always On environment.
- Troubleshoot teh SQL server performance issues and optimize teh TSQL statements.
- Extensively using teh Business Intelligence tools (SSIS, SSAS and SSRS).
- Manage teh Security through Logins, Users, Permissions, Certificates, Credentials and Encryption Schemes as per teh requirements.
- SQL Server space, storage and database management for OLAP systems.
- Extensive usage of SQL Server utilities BCP, Table Diff, DTA, DTEXEC, Profiler and SQLCMD.
- Migrate databases to cloud platform SQL Azure and as well teh performance tuning.
- Build and maintain teh environment on Azure IAAS, PAAS.
- Lead teh team and Manage project deliverables with on-site and offshore model.
- Responsible for team goal settings, timely feedback and improve their performance.
Environment: SQL Server 2005, 2008R2 & 2012, IIS, Windows 2012, SSIS, SSAS, SSRS, Entity Framework.
Confidential, Chambersburg, PA
Senior Support Analyst
Responsibilities:
- Resolving issues related to Enterprise data warehouse (EDW), stored procedures in OLTP system and analysed, design and develop ETL strategies.
- Identified performance issues in existing sources, targets and mappings by analysing teh data flow, evaluating transformations and tuned accordingly for better performance.
- Worked with heterogeneous source to Extracted data from Oracle database, XML and flat files and loaded to a relational Oracle warehouse.
- Troubleshoot standard and reusable mappings and mapplets using various transformations like Expression, Aggregator, Joiner, Router, Lookup (Connected and Unconnected) and Filter.
- Performed tuning of SQL queries and Stored Procedure for speedy extraction of data to resolve and troubleshoot issues in OLTP environment.
- Troubleshooting of long running sessions and fixing teh issues related to it.
- Worked with Variables and Parameters in teh mappings to pass teh values between sessions.
- Involved in teh development of PL/SQL stored procedures, functions and packages to process business data in OLTP system.
- Worked with Services and Portal teams on various occasion for data issues in OLTP system.
- Worked with teh testing team to resolve bugs related to day one ETL mappings before production.
- Creating teh weekly project status reports, tracking teh progress of tasks according to schedule and reporting any risks and contingency plan to management and business users.
Environment: Informatica PowerCenter 8.6.1, Oracle 11g/10g/9i/8i, PL/SQL, SQL Developer 3.0.1, Toad 11.