We provide IT Staff Augmentation Services!

Sr. Big Data Engineer Resume

2.00/5 (Submit Your Rating)

Bellevue, Wa

SUMMARY:

  • Overall 13 years of experience in IT industry and played various roles (Big Data Engineer, ETL Systems Engineer and Database Administrator).
  • Senior Big Data Engineer and Hadoop Developer.
  • Excellent understanding / knowledge of Hadoop architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node and MapReduce Programming Paradigm.
  • Transforming and retrieving the data by using Spark, Impala, Pig, Hive, SSIS and Map Reduce.
  • Data Streaming from various sources like cloud (AWS, Azure) and on - premises by using the tools Spark and Flume.
  • Experience in developing customized UDF’s in Python to extend Hive and Pig Latin functionality.
  • Data importing and exporting by using Sqoop from HDFS to Relational Database Systems and vice-versa.
  • Extensively using open source languages Perl, Python, Scala and Java.
  • Excellent knowledge and Extensively using NOSQL databases (HBase).
  • Experience in Hadoop streaming and writing MR jobs by using Perl, Python other than JAVA.
  • Excellent knowledge and Extensively using WebHDFS REST API commands.
  • Experience in automation and building CICD pipelines by using Jenkins and Chef .
  • Develop generic SQL Procedures and Complex T-SQL statements to achieve the reports generation.
  • Hands on experience on data modelling with Star schema and Snowflake schema.
  • Business Intelligence Systems Engineer and Database Administration.
  • MCP Database Administrator for SQL Server 2005, 2008, 2008 R2, 2012 and 2014.
  • Excellent knowledge on Business Intelligence tools SSIS, SSAS, SSRS, Informatica and PowerBI.
  • Design and Implement the Data Distribution Mechanisms on SQL Server (Transactional, Snapshot, Merge Replications, SSIS and DTS).
  • High Availability and Disaster Recovery Systems Design and Implementation on SQL Server (Always On, Mirroring and Log Shipping).
  • Hands on experience with SQL Server Failover Cluster with Active/Passive model.
  • Database Backup, Database Restore, Data Recovery and Data Protection on SQL Server.
  • SQL Server Capacity planning, Space Management, Data Partition and Data Management.
  • Excellent knowledge on Database/Data Warehousing concepts such as Normalization, Entity-Relationship Modeling, Dimensional Data Modeling, Schema and Metadata.
  • Monitoring Data Activities (Database Status, Logs, Space Utilization, Extents, Checkpoints, Locks and Long Transactions) and apply improvements.
  • Excellent knowledge on Confidential Azure Services, Amazon Web Services and Management.
  • Side by side upgrade, In-Place upgrade and Data Migration.
  • Incident Management, SLA Management, TSG Maintenance and FTM Improvement.
  • Effectively Plan and Manage project deliverable with on-site and offshore model and improve the client satisfaction.
  • Responsible for team goal settings, timely feedback and improve their performance.

TECHNICAL SKILLS:

Big Data: Cloudera, Impala, Pig, Hive, Sqoop, Spark, Oozie, Flume, Hue, HDInsight, Zeppelin, Qubole

BI Tools: SSIS, SSAS, SSRS, Informatica PowerCenter, PowerBI, Zoom-Data

Languages: SQL, Perl, Scala, Java, Power Shell, Python, C#, VB.Net

Server Technologies: ASP.Net, ASP, Hibernate, JSP, JavaScript, XML, JSON

SQL Tools: BCP, TableDiff, DTA, SSIS, SQL Profiler

Operating Systems: RedHat 7, Ubuntu 12.x, Windows 7/8/2008/2012/2003/2012, CentOS 6.0

Tools: Eclipse, Intelli J, SQL Developer, Toad, VSTF, GIT, JIRA

DevOps Tools: Jenkins, Chef

Databases: HDFS, HBASE, SQL Server, SQL Azure, Oracle

PROFESSIONAL EXPERIENCE:

Confidential, Bellevue, WA.

Sr. Big Data Engineer

Responsibilities:

  • Create data pipeline of gathering, cleaning and optimizing data using Hive, Spark.
  • Gathering the data stored in AWS S3 from various third party vendors, optimizing it and joining with internal datasets to gather meaningful information.
  • Combining various datasets in HIVE to generate Business reports.
  • Using partitioning and bucketing in HIVE to optimize queries.
  • Storing data in ORC, Parquet and Avro File format with compression.
  • Moving data between cloud and on premise Hadoop using DISTCP and proprietary ingest framework.
  • Scheduling Workflows, Coordinators and Bundle using Oozie.
  • Using Spark Dataframe API in Scala for analyzing data.
  • Using Hadoop on Cloud service (Qubole) to process data in AWS S3 buckets.
  • Programming using Java and Scala.
  • Use Jenkins and Maven as build tools.
  • A continuous integration and deployment pipeline by using Jenkins and Chef.
  • Working using agile methodology.

Environment: Hadoop, Horton Works, Spark, Zeppelin, Qubole, Oozie, Hive, Sqoop, AWS, Jenkins, Chef, Linux Red-Hat and Teradata.

Confidential, Snoqualmie, WA.

Sr. Big Data Engineer

Responsibilities:

  • Design and Develop Data Collectors and Parsers by using Perl or Python.
  • Experience in developing customized UDF’s in Python to extend Hive and Pig Latin functionality.
  • Design and Develop Parsers for different file formats (CSV, XML, Binary, ASCII, Text, etc.).
  • Data Import and Export from various sources through Script and Sqoop.
  • Extensive usage of Spark for data streaming and data transformation for real time analytics.
  • Extensively using WebHDFS REST API commands in Perl scripting.
  • Big Data management in Hive and Impala (Table, Partitioning, ETL, etc.).
  • Extensive Usage of Hue and other Cloudera tools.
  • Extensive usage of NOSQL (HBASE) Database.
  • Design and Create Dimension, Fact Tables as per the KPIs.
  • Design and Develop Dashboards with KPIs as per the Metrics.
  • Design and Develop Dashboards in Zoom-Data and Write Complex Queries and Data Aggregation.
  • Extensive usage of Cloudera Hadoop distribution.
  • Shell Programming and Crontab automation.

Environment: Hadoop, Cloudera, Spark, Zeppelin, Hue, Impala, Pig, Hive, Sqoop, Zoom-Data, Linux Red-Hat and Oracle.

Confidential, Redmond, WA.

BI Engineer

Responsibilities:

  • Transform the raw data to meaningful information to support business decisions.
  • Develop, Deploy and Troubleshoot the ETL Work Flows using Hive, Pig and Sqoop.
  • Develop Map Reduce jobs for Data Cleanup in Python and C#.
  • Dimensions, Facts Design, and Data Modeling for reporting purpose.
  • Design and Implement the DataMart and improve the performance.
  • Identify the Reports for decision-making and create the Dash Boards by using the PowerBI.
  • Data analysis and Management reporting, Generate Metrics, Data Discovery, Dashboards and Scorecards.
  • Writing generic SQL Procedures and Complex T-SQL statements to achieve the reports generation.
  • Creating complex graphical representation to identify the service health for decisions making.
  • Maintain the HDInsight Clusters, troubleshoot the issues, and coordinate with partners.
  • Prepare the Metrics and bandwidth utilization of the Team, identify the root cause of the spike, and apply the improvements.
  • Identify the process gaps and define the process to optimize the support.
  • Extensive usage of Azure Portal, Azure PowerShell, Storage Accounts, Certificates and Azure Data Management.
  • Excellent knowledge on Confidential Cloud technology, Components and Subscription Management.
  • Prepare and Present the Metrics for the Team utilization and Environment status in PowerBI, Power Point and SQL Azure.
  • Responsible for team goal settings, timely feedback and improve their performance.

Environment: Hadoop, Horton Works, HDInsight 3.1, YARN, Oozie, Pig, Hive, Sqoop, PowerBI, Azure Storage and SQL Server 2014.

Confidential, Redmond, WA.

ETL System Engineer SQL DBA

Responsibilities:

  • Deploy and Troubleshoot ETL jobs that use SSIS packages.
  • Hands on experience for Data Integrity process and Data Modelling concepts.
  • Manage and troubleshoot the multi-dimensional data cubes developed in SSAS.
  • Manage large data movements with partitions and data management.
  • Backup the databases, restore the databases, and troubleshoot the issues.
  • Setup the Always On including Windows cluster and troubleshoot the issues.
  • Setup the Snapshot, Transactional replication on AlwaysOn environment.
  • Troubleshoot the SQL server performance issues and optimize the TSQL statements.
  • Extensively using the Business Intelligence tools (SSIS, SSAS and SSRS).
  • Manage the Security through Logins, Users, Permissions, Certificates, Credentials and Encryption Schemes as per the requirements.
  • SQL Server space, storage and database management for OLAP systems.
  • Extensive usage of SQL Server utilities BCP, TableDiff, DTA, DTEXEC, Profiler and SQLCMD.
  • Migrate databases to cloud platform SQL Azure and as well the performance tuning.
  • Build and maintain the environment on Azure IAAS, PAAS.
  • Lead the team and Manage project deliverables with on-site and offshore model.
  • Responsible for team goal settings, timely feedback and improve their performance.

Environment: SQL Server 2005, 2008R2 & 2012, IIS, Windows 2012, SSIS, SSAS, SSRS, Entity Framework.

Confidential .

Senior Support Analyst

Responsibilities:

  • Resolving issues related to Enterprise data warehouse (EDW), stored procedures in OLTP system and analyzed, design and develop ETL strategies.
  • Identified performance issues in existing sources, targets and mappings by analyzing the data flow, evaluating transformations and tuned accordingly for better performance.
  • Worked with heterogeneous source to Extracted data from Oracle database, XML and flat files and loaded to a relational Oracle warehouse.
  • Troubleshoot standard and reusable mappings and mapplets using various transformations like Expression, Aggregator, Joiner, Router, Lookup (Connected and Unconnected) and Filter.
  • Performed tuning of SQL queries and Stored Procedure for speedy extraction of data to resolve and troubleshoot issues in OLTP environment.
  • Troubleshooting of long running sessions and fixing the issues related to it.
  • Worked with Variables and Parameters in the mappings to pass the values between sessions.
  • Involved in the development of PL/SQL stored procedures, functions and packages to process business data in OLTP system.
  • Worked with Services and Portal teams on various occasion for data issues in OLTP system.
  • Worked with the testing team to resolve bugs related to day one ETL mappings before production.
  • Creating the weekly project status reports, tracking the progress of tasks according to schedule and reporting any risks and contingency plan to management and business users.

Environment: Informatica PowerCenter 8.6.1, Oracle 11g/10g/9i/8i, PL/SQL, SQL Developer 3.0.1, Toad 11.

We'd love your feedback!