Sr. Big Data Engineer Resume
Bellevue, Wa
SUMMARY:
- Overall 13 years of experience in IT industry and played various roles (Big Data Engineer, ETL Systems Engineer and Database Administrator).
- Senior Big Data Engineer and Hadoop Developer.
- Excellent understanding / knowledge of Hadoop architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node and MapReduce Programming Paradigm.
- Transforming and retrieving the data by using Spark, Impala, Pig, Hive, SSIS and Map Reduce.
- Data Streaming from various sources like cloud (AWS, Azure) and on - premises by using the tools Spark and Flume.
- Experience in developing customized UDF’s in Python to extend Hive and Pig Latin functionality.
- Data importing and exporting by using Sqoop from HDFS to Relational Database Systems and vice-versa.
- Extensively using open source languages Perl, Python, Scala and Java.
- Excellent knowledge and Extensively using NOSQL databases (HBase).
- Experience in Hadoop streaming and writing MR jobs by using Perl, Python other than JAVA.
- Excellent knowledge and Extensively using WebHDFS REST API commands.
- Experience in automation and building CICD pipelines by using Jenkins and Chef .
- Develop generic SQL Procedures and Complex T-SQL statements to achieve the reports generation.
- Hands on experience on data modelling with Star schema and Snowflake schema.
- Business Intelligence Systems Engineer and Database Administration.
- MCP Database Administrator for SQL Server 2005, 2008, 2008 R2, 2012 and 2014.
- Excellent knowledge on Business Intelligence tools SSIS, SSAS, SSRS, Informatica and PowerBI.
- Design and Implement the Data Distribution Mechanisms on SQL Server (Transactional, Snapshot, Merge Replications, SSIS and DTS).
- High Availability and Disaster Recovery Systems Design and Implementation on SQL Server (Always On, Mirroring and Log Shipping).
- Hands on experience with SQL Server Failover Cluster with Active/Passive model.
- Database Backup, Database Restore, Data Recovery and Data Protection on SQL Server.
- SQL Server Capacity planning, Space Management, Data Partition and Data Management.
- Excellent knowledge on Database/Data Warehousing concepts such as Normalization, Entity-Relationship Modeling, Dimensional Data Modeling, Schema and Metadata.
- Monitoring Data Activities (Database Status, Logs, Space Utilization, Extents, Checkpoints, Locks and Long Transactions) and apply improvements.
- Excellent knowledge on Confidential Azure Services, Amazon Web Services and Management.
- Side by side upgrade, In-Place upgrade and Data Migration.
- Incident Management, SLA Management, TSG Maintenance and FTM Improvement.
- Effectively Plan and Manage project deliverable with on-site and offshore model and improve the client satisfaction.
- Responsible for team goal settings, timely feedback and improve their performance.
TECHNICAL SKILLS:
Big Data: Cloudera, Impala, Pig, Hive, Sqoop, Spark, Oozie, Flume, Hue, HDInsight, Zeppelin, Qubole
BI Tools: SSIS, SSAS, SSRS, Informatica PowerCenter, PowerBI, Zoom-Data
Languages: SQL, Perl, Scala, Java, Power Shell, Python, C#, VB.Net
Server Technologies: ASP.Net, ASP, Hibernate, JSP, JavaScript, XML, JSON
SQL Tools: BCP, TableDiff, DTA, SSIS, SQL Profiler
Operating Systems: RedHat 7, Ubuntu 12.x, Windows 7/8/2008/2012/2003/2012, CentOS 6.0
Tools: Eclipse, Intelli J, SQL Developer, Toad, VSTF, GIT, JIRA
DevOps Tools: Jenkins, Chef
Databases: HDFS, HBASE, SQL Server, SQL Azure, Oracle
PROFESSIONAL EXPERIENCE:
Confidential, Bellevue, WA.
Sr. Big Data Engineer
Responsibilities:
- Create data pipeline of gathering, cleaning and optimizing data using Hive, Spark.
- Gathering the data stored in AWS S3 from various third party vendors, optimizing it and joining with internal datasets to gather meaningful information.
- Combining various datasets in HIVE to generate Business reports.
- Using partitioning and bucketing in HIVE to optimize queries.
- Storing data in ORC, Parquet and Avro File format with compression.
- Moving data between cloud and on premise Hadoop using DISTCP and proprietary ingest framework.
- Scheduling Workflows, Coordinators and Bundle using Oozie.
- Using Spark Dataframe API in Scala for analyzing data.
- Using Hadoop on Cloud service (Qubole) to process data in AWS S3 buckets.
- Programming using Java and Scala.
- Use Jenkins and Maven as build tools.
- A continuous integration and deployment pipeline by using Jenkins and Chef.
- Working using agile methodology.
Environment: Hadoop, Horton Works, Spark, Zeppelin, Qubole, Oozie, Hive, Sqoop, AWS, Jenkins, Chef, Linux Red-Hat and Teradata.
Confidential, Snoqualmie, WA.
Sr. Big Data Engineer
Responsibilities:
- Design and Develop Data Collectors and Parsers by using Perl or Python.
- Experience in developing customized UDF’s in Python to extend Hive and Pig Latin functionality.
- Design and Develop Parsers for different file formats (CSV, XML, Binary, ASCII, Text, etc.).
- Data Import and Export from various sources through Script and Sqoop.
- Extensive usage of Spark for data streaming and data transformation for real time analytics.
- Extensively using WebHDFS REST API commands in Perl scripting.
- Big Data management in Hive and Impala (Table, Partitioning, ETL, etc.).
- Extensive Usage of Hue and other Cloudera tools.
- Extensive usage of NOSQL (HBASE) Database.
- Design and Create Dimension, Fact Tables as per the KPIs.
- Design and Develop Dashboards with KPIs as per the Metrics.
- Design and Develop Dashboards in Zoom-Data and Write Complex Queries and Data Aggregation.
- Extensive usage of Cloudera Hadoop distribution.
- Shell Programming and Crontab automation.
Environment: Hadoop, Cloudera, Spark, Zeppelin, Hue, Impala, Pig, Hive, Sqoop, Zoom-Data, Linux Red-Hat and Oracle.
Confidential, Redmond, WA.
BI Engineer
Responsibilities:
- Transform the raw data to meaningful information to support business decisions.
- Develop, Deploy and Troubleshoot the ETL Work Flows using Hive, Pig and Sqoop.
- Develop Map Reduce jobs for Data Cleanup in Python and C#.
- Dimensions, Facts Design, and Data Modeling for reporting purpose.
- Design and Implement the DataMart and improve the performance.
- Identify the Reports for decision-making and create the Dash Boards by using the PowerBI.
- Data analysis and Management reporting, Generate Metrics, Data Discovery, Dashboards and Scorecards.
- Writing generic SQL Procedures and Complex T-SQL statements to achieve the reports generation.
- Creating complex graphical representation to identify the service health for decisions making.
- Maintain the HDInsight Clusters, troubleshoot the issues, and coordinate with partners.
- Prepare the Metrics and bandwidth utilization of the Team, identify the root cause of the spike, and apply the improvements.
- Identify the process gaps and define the process to optimize the support.
- Extensive usage of Azure Portal, Azure PowerShell, Storage Accounts, Certificates and Azure Data Management.
- Excellent knowledge on Confidential Cloud technology, Components and Subscription Management.
- Prepare and Present the Metrics for the Team utilization and Environment status in PowerBI, Power Point and SQL Azure.
- Responsible for team goal settings, timely feedback and improve their performance.
Environment: Hadoop, Horton Works, HDInsight 3.1, YARN, Oozie, Pig, Hive, Sqoop, PowerBI, Azure Storage and SQL Server 2014.
Confidential, Redmond, WA.
ETL System Engineer SQL DBA
Responsibilities:
- Deploy and Troubleshoot ETL jobs that use SSIS packages.
- Hands on experience for Data Integrity process and Data Modelling concepts.
- Manage and troubleshoot the multi-dimensional data cubes developed in SSAS.
- Manage large data movements with partitions and data management.
- Backup the databases, restore the databases, and troubleshoot the issues.
- Setup the Always On including Windows cluster and troubleshoot the issues.
- Setup the Snapshot, Transactional replication on AlwaysOn environment.
- Troubleshoot the SQL server performance issues and optimize the TSQL statements.
- Extensively using the Business Intelligence tools (SSIS, SSAS and SSRS).
- Manage the Security through Logins, Users, Permissions, Certificates, Credentials and Encryption Schemes as per the requirements.
- SQL Server space, storage and database management for OLAP systems.
- Extensive usage of SQL Server utilities BCP, TableDiff, DTA, DTEXEC, Profiler and SQLCMD.
- Migrate databases to cloud platform SQL Azure and as well the performance tuning.
- Build and maintain the environment on Azure IAAS, PAAS.
- Lead the team and Manage project deliverables with on-site and offshore model.
- Responsible for team goal settings, timely feedback and improve their performance.
Environment: SQL Server 2005, 2008R2 & 2012, IIS, Windows 2012, SSIS, SSAS, SSRS, Entity Framework.
Confidential .
Senior Support Analyst
Responsibilities:
- Resolving issues related to Enterprise data warehouse (EDW), stored procedures in OLTP system and analyzed, design and develop ETL strategies.
- Identified performance issues in existing sources, targets and mappings by analyzing the data flow, evaluating transformations and tuned accordingly for better performance.
- Worked with heterogeneous source to Extracted data from Oracle database, XML and flat files and loaded to a relational Oracle warehouse.
- Troubleshoot standard and reusable mappings and mapplets using various transformations like Expression, Aggregator, Joiner, Router, Lookup (Connected and Unconnected) and Filter.
- Performed tuning of SQL queries and Stored Procedure for speedy extraction of data to resolve and troubleshoot issues in OLTP environment.
- Troubleshooting of long running sessions and fixing the issues related to it.
- Worked with Variables and Parameters in the mappings to pass the values between sessions.
- Involved in the development of PL/SQL stored procedures, functions and packages to process business data in OLTP system.
- Worked with Services and Portal teams on various occasion for data issues in OLTP system.
- Worked with the testing team to resolve bugs related to day one ETL mappings before production.
- Creating the weekly project status reports, tracking the progress of tasks according to schedule and reporting any risks and contingency plan to management and business users.
Environment: Informatica PowerCenter 8.6.1, Oracle 11g/10g/9i/8i, PL/SQL, SQL Developer 3.0.1, Toad 11.