We provide IT Staff Augmentation Services!

Sr. Bigdata Engineer Resume

Jersey City, NJ

SUMMARY:

  • Over 14 years of IT experience in the areas of Application Software Requirement Analysis, Design, Development, Testing, Implementation, and Maintenance in the areas of Big Data, Hadoop ecosystem related technologies.
  • 5 years of strong experience, working on Apache Hadoop ecosystem components like HDFS, YARN, HBase, Hive, Sqoop, Pig, Oozie, Zookeeper, Flume, Spark, Python3.6 with HDP 2.6 and AWS EMR 2.7.
  • Experience in building on - prem data lake.
  • Hands on experience on AWS cloud services (EC2, S3, RDS, Coludwatch, Redshift, EMR, Kinesis, Athena, ElasticSeach, Aurora, Glue Catalog and Lambda).
  • Used Oozie, crontab and Control - M workflow engine for managing and scheduling Hadoop Jobs.
  • Migrated an existing on-premises application to AWS. Used AWS services like EC2 and S3 for small data sets processing and storage.
  • Experience with different data formats like Json, Avro, parquet, RC and ORC and compressions like snappy & bzip.
  • Proficient in big data ingestion and streaming tools like Flume, Sqoop, Kafka, and Storm.
  • Develop Proof-of-Concept projects to validate new architectures and solutions.
  • Experience in transferring data from RDBMS to HDFS and HIVE table using Sqoop .
  • Experience in working with flume to load the log data from multiple sources directly into HDFS.
  • Extensive experience in Spark Scala (RDD, Dataframes and DataSets) and python scripting (Pandas, Numpy).
  • Knowledge of manipulating/analyzing large datasets and finding patterns and insights within structured and unstructured data.
  • Scripting (Hadoop related): Developed “Business Intelligence” scripts for data-analysis in PIG and HIVE. Executed workflows using Oozie.
  • Experience supporting systems with 24X7 availability and monitoring.
  • Worked with version control systems like SVN,GitHub.
  • Experience in designing and architecting Hadoop applications and recommending right tools and technologies.
  • Experience in developing map reduce and Spark code in Scala and Python.
  • Hands on experience on Hadoop ingestion/data lake tools like Bedrock, Talend etc.
  • Experienced in creating complex mappings using various transformations, and developing strategies for Extraction, Transformation and Loading (ETL) mechanism.
  • Extensive experienced (8+ years) in working with Onsite-Offshore models for multiple projects.
  • Experience working in agile environment (Jira Kanban Boards, Daily Scrum Calls, Sprint meetings).
  • Excellent interpersonal, communication, documentation and presentation skills.

TECHNICAL SKILLS:

Hadoop Ecosystem: AWS EMR 5.x, S3, EC2, RedShift, Athena, ElasticSearch, CloudWatch, Lambda, Kinesis Firehouse, Glue Catalog, HDP 2.3/2.6/3.x, CDH 5.7.4/ 5.14/ 6.0.1, Ambari 2.6/2.7, Cloudera Manager, Cloudera Navigator, Hadoop 2.7.2/3.0.0, Sentry 2.0.0, HBase 2.0.2, Impala 3.0.0, Zookeeper 3.4.5, Hive 2.1.1, Pig 0.17.0, Sqoop 1.4.7, Flume 1.8.0, Oozie 5.0.0, Zeppelin, Jupyter, Hue 4.2.0, Tez, Kafka 1.0.1, R-Packages, Python3.6, Spark 2.2.0, Ganglia.

Hardware/Operating Systems: Windows NT, Windows2003, 2008 R2 and 2012 server, UNIX, AIX 4.3.x, Solaris9/10, RHEL 5.x, 6.x, 7.x and CentOS, HP-UX 11.0 and 11i, SAN.

Development Languages: HiveQL, SQL, PL/SQL, C, C++, PHP, Python, Core Java, JavaScript, Shell Script, Perl script, Visual Source Safe, Crystal Reports, Red Gate, Erwin, Visio, GitHub

PROFESSIONAL EXPERIENCE:

Confidential, Jersey City, NJ

Sr. BigData Engineer

RESPONSIBILITIES:

  • Developing scripts for build, deployment, maintenance and related tasks using Docker, Python and Bash.
  • Involved in architecture, design, development and implementation of Hadoop deployment, backup and recovery systems.
  • Developed Hive Scripts and UDFs to process/ transform Hive data as per Business requirements.
  • Developed Hive and Pig programs to validate and cleanse the data(UNIX &HDFS) and load in HDFS obtained from heterogeneous data sources to use it for analytics.
  • Used Sqoop to import data from different RDBMS systems like Oracle, DB2 and Netezza and loaded into HDFS.
  • Developed workflow in Oozie to manage and schedule jobs on Hadoop cluster to trigger daily, weekly and monthly batch cycles.
  • Worked on Spark batch processing and developed scripts in Spark Scala and Spark SQL.
  • Hands on experience in AWS Cloud in various AWS services such as Redshift cluster, Route 53 domain configuration.
  • Designed the end to end flow to deliver/consume the raw/clean logs to/from HDFS and AWS S3.
  • Used Cloud watch logs to move application logs to S3 and create alarms based on a few exceptions raised by applications.
  • Optimized the HDFS storage by storing the data in compressed mode like Snappy, LZO, Gzip etc.
  • Experience in processing JSON, XML, Parquet, ORC file format data and storing in Hive.
  • Experience working with different vendors in processing and utilizing the data.
  • Gained experience in storing large volumes of data on Cassandra for High availability of analytical data.
  • Used Solr/Lucene for developing open source enterprise search platform in a testing and developing environment.
  • Exported the analyzed data into relational databases using Sqoop for visualization and to generate reports for the BI team.
  • Collaborated with the customer services to develop products implementing the insights obtained from analysis. Financial management tools were created and deployed for use by customers to help them understand their spending/saving pattern.
  • Gained knowledge in installing cluster, commissioning & decommissioning of DataNode, NameNode recovery, capacity planning, and slots configuration.
  • Assist the team in their development & deployment activities.
  • Experienced in defining job work flows as per their dependencies in Oozie.
  • Extended above data model to allow content management, including archival, annotation, check-in/out.
  • Developed and optimized code for bulk file transfer, update, and deletion of files in content management system using DataStax drivers.
  • Developed code to record related metrics, like, file transfer rate, successful and failed transfers.

ENVIRONMENT: AWS EMR 5.x, S3, EC2, RedShift, Athena, ElasticSearch, CloudWatch, Lambda, Kinesis Firehouse, Glue Catalog, Spark Scala, HDP 2.6, Ambari 2. 5 , Apache Hadoop 2.7 .3 , YARN, HDFS 2.7.3 , Hive 2.1, Pig 0.16, Flume 1.5, Sqoop1.4 .6 , Spark 2.0, ZooKeeper 3.4.6, Kafka 0.9, Tez, 0.7, Ranger 0.6, Knox 0.6, Zeppelin 0.6, Kerberos 5, MySQL .

Confidential, Englewood Cliff, NJ

Hadoop Developer

RESPONSIBILITIES:

  • Building data pipelines to ingest and process data in batch and real-time.
  • Loading data from different sources into on-prem datalake.
  • Preform data profiling and data quality as per business requirements using HiveQL, python and shell scripts.
  • Installed and configured Hadoop eco system components like HDFS, HBase, Zookeeper, Oozie, Hive, Pig, Flume, Sqoop.
  • Managing Hadoop Spark and MapReduce jobs using scheduler.
  • Apply different HDFS formats and structure like Parquet, ORC, Avro, etc. to speed up analytics.
  • Managing and deploying ElasticSearch and Kafka clusters.
  • Working closely with data architects, data scientists, and data visualization developers to design, build, test, deliver, and maintain sustainable and highly scalable data solution
  • Created Hbase tables to load the data for large data sets coming from difference sources.
  • Worked on capacity planning to manage application data and resource utilization on on-prem Hadoop cluster.
  • Worked on importing and exporting data into HDFS using Sqoop.
  • Involved in creating Hive tables, loading with data and writing hive queries.
  • Worked closely with enterprise data warehouse.
  • Worked in monitoring, troubleshooting and managing Hadoop log files.
  • Experienced in defining job flows in Oozie.
  • Knowledge in performance troubleshooting and tuning Hadoop clusters.
  • Involved in creating Hive tables loading with data and writing hive queries which will run internally in map reduce way.
  • Cluster coordination services through Zookeeper.
  • Supported in setting up QA environment and updating configurations for implementing scripts with Hive, Pig and Sqoop.

ENVIRONMENT: RHEL Linux 6.x, HDP2.4, Ambari 2.2.1, Hbase 1.1.2, Apache Hadoop 2.7, YARN, HDFS, Hive 1.2, Pig 0.15, Flume 1.5, Sqoop1.4, SPARK1.6, ZK 3.4.6, Kafka 0.9, Tez, 0.7, Ranger 0.5.0, Knox 0.6, Kerberos.

Confidential, Hoboken, NJ

BigData Administrator

RESPONSIBILITIES:

  • Responsible for Cluster configuration maintenance and troubleshooting and tuning the cluster.
  • Secure a deployment and understand Backup and Recovery.
  • Involved in creating Hive tables loading with data and writing hive queries which will run internally in map reduce way.
  • Rack Aware Configuration, Configuring Client Machines Configuring, Monitoring and Management Tools.
  • File system management and monitoring. HDFS support and maintenance.
  • Responsible for building a cluster on HDP 2.3.
  • Collaborating with application teams to install operating system and Hadoop updates, patches, version upgrades when required. Point of Contact for Vendor escalation.
  • Major Upgrade from HDP 2.2 to HDP 2.3.
  • Created POC to store Server Log data into Cassandra to identify System Alert Metrics.
  • Good experience in troubleshoot production level issues in the cluster and its functionality.
  • Deployed Puppet, Puppet Dashboard, and Puppet DB for configuration management to existing infrastructure.
  • Installation and Configuration of other Open Source Software like Pig, Hive, HBASE, Flume and Sqoop
  • Changes to the configuration properties of the cluster based on volume of the data being processed and performance of the cluster.
  • Working with dev Team to tune Job Knowledge of Writing Hive Jobs.
  • Set up and manage HA name node and Name node federation using Apache 2.0 to avoid single point of failures in large clusters.
  • Configuring Flume for efficiently collecting, aggregating and moving large amounts of log data from many different sources to HDFS.
  • Involving in collecting and aggregating large amounts of streaming data into HDFS using Flume and defined.
  • Working with data delivery teams to setup new Hadoop users. This job includes setting up Linux users, setting up Kerberos principals and testing MFS, Hive.
  • Working with HortonWorks Support Team to Fine tune Cluster.
  • Built data platforms, pipelines, and storage systems using the Apache Kafka, Apache Storm and search technologies such as Elastic search.
  • Responsible for scheduling jobs in Hadoop using FIFO, Fair scheduler and Capacity scheduler
  • Set up automated processes to analyze the System and Hadoop log files for predefined errors and send alerts to appropriate groups.

ENVIRONMENT: HDP 2.3, Ambari 2. x , Apache Hadoop 2.6, YARN, HDFS 2.6 , Hive 1.x, Pig 0.16, Flume 1.5, Sqoop1.4 .6 , Spark 2.0, ZooKeeper 3.4.6, Kafka 0.9, Tez, 0.7, Ranger 0.6, Knox 0.6, Zeppelin 0.6, Kerberos 5 .

Confidential

Database Developer/ Administrator

RESPONSIBILITIES:

  • Worked as DBA in Production and Implementation for client FORD, Maintaining 200+ Production, UAT and Development Servers, databases size including 3TB VLDB’s.
  • Worked as a 3rd line Production DBA Support.
  • On Call support for 24X7.
  • Gathering and Understanding the business requirements (Hardware and Software), Building PROD, UAT and Development servers accordingly.
  • Expert Experience in Database Architecture.
  • Installation of SQL Server 2012\2008 R2\2008\2005 and its configuration and DB migration from SQL 2000\2005 to SQL server 2008 R2\2012
  • Configure, monitor and troubleshoot SQL 2012 Always on High Availability in between the data centres.
  • Implement and maintain database security(create and maintain logins, users, roles, assign privileges)
  • Involved in Disaster Recovery test.
  • Configure and monitor Transaction with multiple subscribers and merge Replication, resolved critical issues.
  • Proactively configure and monitoring of live site production databases in Microsoft clustered environment.
  • Experience on Optimizing Code and Improving Efficiency in databases including Re-indexing, Updating Statistics, Recompiling Stored Procedures and performing other maintenance tasks.
  • SQL Server 2005/2008 profiling with Dynamic Management Views in both Server-scoped DMV and Database-scoped DMV.
  • Successfully maintain peak performance of all primary databases by providing advanced tuning methods, customized scripts.
  • MS SQL Server database administration configure Scheduled Tasks for regular Database Backup and Maintenance Activities.
  • Optimized the database by creating various clustered, non-clustered indexes and indexed views.
  • Developed, deployed and monitored SSIS Packages for new ETL Processes and upgraded the existing DTS packages to SSIS for the on-going ETL Processes.
  • Monitored Full/Incremental/Daily Loads and support all scheduled ETL jobs for batch processing. Troubleshooting of SQL job failures, DTS and SSIS packages.

ENVIRONMENT: SQL Server 2012/2008 R2/2005/2000, Windows 2K/2003/2008 R2, SQL Profiler, Tivoli, BMC Patrol, SQL Nexus, SSIS 2005/2008 R2/2012, SSRS 2005/2008 R2/2012.

Confidential, Plano, TX

Database Administrator

RESPONSIBILITIES:

  • Taking care of production servers for Windows 2000/2003 R2, SQL Server 2000, 2005 and 2008R2. Litespeed backup.
  • Used the SQL Server Profiler tool to monitor the performance of SQL Server - particularly to analyze the performance of the stored procedures.
  • As part of a team, analyzed the different high availability solutions and implemented database mirroring and replication.
  • Migrating the database from SQL Server 2000 and R2
  • As part of a team, established security policy using windows domain accounts and implemented a security strategy to protect against threats and attacks
  • Established and wrote Performance Tuning guidelines to enhance quality of our product for all current and future code development.
  • Configure and monitor Transaction with multiple subscribers and merge Replication, resolved critical issues.
  • Successfully maintain peak performance of all primary databases by providing advanced tuning methods, customized scripts.
  • Major activities include advanced performance tuning, database design, capacity planning and establishing standards for SQL installations, maintenance, tuning, and coding standards.
  • Maintaining the database consistency with DBCC at regular intervals
  • Monitored and modified Performance using execution plan and Index tuning.
  • To set up SQL Server configuration settings.
  • Implementing Automated Backup and Database Maintenance / Cleanup jobs.
  • Using log shipping for synchronization of database.
  • Created and developed the stored procedures, triggers to handle complex business rules and audit analysis.
  • Monitoring the servers using spot light third party Quest software.
  • Scheduled and monitored all maintenance activities of SQL Server 2000 including database consistency check, and index de-fragmentation by DBDEFRAG and DBREINDEX.
  • Set up SQL Mail and Created Jobs to Automate & Scheduled many tasks.

ENVIRONMENT: SQL Server 2000, 2005, 2008\R2, Windows 2K\2003\2008 Data Center, SQL Profiler, LiteSpeed Backup software, Reporting Services

Confidential

Database Developer.

RESPONSIBILITIES:

  • Building DEV/PROD servers for Windows 2000/2003 R2, SQL Server 2000 and 2005.
  • Migrating the database from SQL Server .
  • Responding to the system generated action items and resolve them depending on the contract level of the customer.
  • Co-coordinating with the customers vendors for any system up gradation and giving the exact procedure to follow up.
  • Providing the monthly Management reports/statistics.
  • Scheduling the daily/weekly/monthly backups.
  • Patching up the system to the latest version as per the recommendations.
  • Monitor the health of the servers, Operating system, database and the network.
  • Maintenance of Hard disks (Formatting and Setup, Repair from crashes)
  • Create and maintain user accounts administering file systems and recognizing file access problems.
  • Granting the required access codes to various groups Performance tuning, Maintaining disks through VERITAS Volume Manager, Crontab, Growing/shrinking VxFS file systems
  • Taking disks out of VxVM, VERITAS Cluster support.
  • Performing patches research, patches installation and packages installation on Sun Solaris
  • Fine tuning of servers and configuring networks for optimum performance
  • Planning and implementing system upgrades including hardware, operating system and periodical patch upgrades setting up RAID levels on Sun Storage Equipments using VERITAS Volume Manager and Solstice Disk suite on new/existing production/development systems for reliability, fault tolerance, and availability
  • Adding/expanding new storage to existing/new systems using VERITAS Volume Manager
  • Writing shell scripts as per requirements
  • Creating, Scheduling and Managing Cron jobs
  • Handling Support cases based on priority and SLA.

ENVIRONMENT: SQL Server 2000, 2005, 2008\R2, Windows 2K\2003\2008, Sun Solaris 8, Sun Enterprise servers E4500, E3500, Sun Fire V480, sun Ultra Sparc 5/10, Sun Blade servers, Oracle 8i/9i, NFS, BMC patrol and HP OpenView.

Hire Now