We provide IT Staff Augmentation Services!

Big Data Developer Resume

3.00/5 (Submit Your Rating)

Piscataway, NJ

SUMMARY:

  • Over 8 years of experience in IT Industry with 4 years of experience in big data analytics, Hadoop and SQL database handling.
  • Strong Knowledge and development experience in Hadoop and Big Data Ecosystem including Map Reduce, HDFS, Hive, Pig, Spark, Cloudera Navigator, HBase, Zookeeper, Strom, Sqoop, Flume, Oozie and Impala.
  • Extensive working knowledge in Analysis, Design, Development, Documentation and Deployment in handling projects.
  • Extending Hive and Pig core functionality by using custom UDF, UDTF and UDAF.
  • Strong knowledge on implementation of data processing on SPARK CORE using SPARK SQL, MLib and Spark streaming.
  • Experience with scripting languages (SQL, Scala and Pig) to manipulate data.
  • Worked with relational database systems (RDBMS) such as MySQL, SQL, Oracle and NoSQL database systems like HBase and Cassandra.
  • Ability to design and support development of a data platform for data processing (data ingestion and data transformation) and data repository using Big Data Technologies like Hadoop stack including HDFS cluster, MapReduce, Spark, Scala, Hive and Impala.
  • Experience developing Pig Latin and HiveQL scripts for Data Analysis and ETL purposes and also extended the default functionality by writing User Defined Functions (UDFs) for data specific processing.
  • Strong SQL and Hive knowledge in query processing, optimization and execution, query performance
  • Experience with Oozie Workflow Engine in running workflow jobs with actions that run Hadoop Map Reduce and Pig jobs.
  • Working with data extraction, transformation and load in Hive, Pig and HBase.
  • Working knowledge in Hadoop HDFS Admin Shell commands.
  • Good understanding of distributed systems, HDFS architecture, internal working details of MapReduce and Spark processing frameworks.
  • Experienced in running Hadoop streaming jobs to process terabytes of xml format data.
  • Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, Python and Scala.
  • Experience in AWS environment to develop and deploy custom Big Data applications.
  • Experience in Spark Streaming in order to ingest data from multiple data sources into HDFS.
  • Experience in converting SQL queries into Spark Transformations using Spark RDDs, Data Frames and Scala, and performed map - side joins on RDD's.
  • Performed ETL process with Spark using Scala for processing and validation of raw data logs.
  • Performed data processing in Spark by handling multiple data repositories / data sources.
  • Worked with BI tools like Tableau for report creation and further analysis from the front end.
  • Connecting Hive using Tableau and generating Bar chart etc based on business requirement.

TECHNICAL SKILLS:

Big Data: Map Reduce, HBase, Kafka, PIG, HIVE, Sqoop, Impala and Flume,oozie

DB Languages: SQL, PL/SQL, Oracle

Operating Systems: Windows, Unix and Linux

Programming Language: Scala, Python, C

Data Base: SQL Server, MS Access, Cassandra, HBase

Web Technologies: HTML, XML, JavaScript

IDE Development Tools: Eclipse, NetBeans

Methodologies: Agile, Scrum and Waterfall

PROFESSIONAL EXPERIENCE:

Confidential, Piscataway, NJ

Big Data Developer

Responsibilities:

  • Data Ingestion into the Indie-Data Lake using Open source Hadoop distribution to process Structured, Semi-Structured and Unstructured datasets using Open source Apache tools like FLUME and SQOOP into HIVE environment.
  • Handled importing of data from various data sources, performed transformations using Hive, Map Reduce, Spark and loaded data into HDFS.
  • Automating and scheduling the Sqoop jobs in a timely manner using Unix Shell Scripts.
  • Involved in identifying job dependencies to design workflow for Oozie & YARN resource management.
  • Working on data using Sqoop from HDFS to Relational Database Systems and vice-versa. Maintaining and troubleshooting Hadoop core and ecosystem components (HDFS, Map/Reduce, Name node, Data node, Job tracker, Task tracker, Zookeeper, YARN, Oozie, Hive, Hue, Flume, HBase, and Fair Scheduler). Hands on experience installing, configuring, administering, debugging and troubleshooting Apache and Datastax Cassandra clusters.
  • Worked in tuning Hive & Pig to improve performance and solved performance issues in both scripts with understanding of Joins, Group and aggregation and how does it translate to Map Reduce jobs
  • Created Partitions, Buckets based on State to further process using Bucket based Hive joins.
  • Developing data pipeline using Flume, Sqoop, Pig and Java map reduce to ingest customer behavioral data and financial histories into HDFS for analysis.
  • Involved in scheduling Oozie workflow engine to run multiple Hive and pig jobs.
  • Involved in scheduling Oozie workflow to automatically update the firewall.
  • Responsible for managing existing data extraction jobs, but also play a vital role in building new data pipelines from various structured and unstructured sources into Hadoop.
  • Exploring with Spark to improve the performance and optimization of the existing algorithms in Hadoop using Spark context, Spark-SQL, Data Frame, pair RDD's.
  • Developed Spark code using Scala and Spark-SQL/Streaming for faster testing and processing of data.
  • Responsible for design development of Spark SQL Scripts based on Functional Specifications
  • Responsible for Spark Streaming configuration based on type of Input Source
  • Involved the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD's, Spark YARN

Environment: HDFS, Map Reduce, Pig, Hive, Sqoop, Flume, Oozie, HBase, Impala, Spark Streaming, Yarn, Eclipse, spring, PL/SQL, UNIX Shell Scripting, Cloudera.

Confidential, Princeton, NJ

Hadoop Developer

Responsibilities:

  • Performed performance tuning and troubleshooting of MapReduce jobs by analyzing and reviewing Hadoop log files.
  • Installed and configured Hadoop Map reduces, HDFS, Cassandra and developed multiple Map Reduce jobs in java for data cleaning and preprocessing.
  • Collected the logs from the physical machines and the OpenStack controller and integrated into HDFS using Flume.
  • Worked extensively on HIVE, SQOOP, SHELL, PIG,
  • Used Oozie to schedule the workflows to perform shell action and hive actions.
  • Experience in running Hadoop streaming jobs to process terabytes of xml format data.
  • Developed solutions for interactive, iterative and batch mode analytical needs.
  • Configuring HIVE and Oozie to store metadata in Microsoft SQL Server.
  • Experienced in migrating HiveQL into Impala to minimize query response time.
  • Developing and running Map-Reduce jobs on YARN and Hadoop clusters to produce daily and monthly reports as per user's need
  • Worked closely with data analysts to construct creative solutions for their analysis tasks.
  • Managed and reviewed Hadoop and HBase log files.
  • Worked on Sequence files, RC files, Map side joins, bucketing, partitioning for Hive performance enhancement and storage improvement.
  • Sqoop jobs, PIG and Hive scripts were created for data ingestion from relational databases to compare with historical data.
  • Created HBase tables to load large sets of structured, semi-structured and unstructured data coming from UNIX, NoSQL and a variety of portfolios.
  • Involved in submitting and tracking Map Reduce jobs using Job Tracker.
  • Used Pig as ETL tool to do transformations, event joins, filter and some pre-aggregations
  • Implemented Hive Generic UDF's to implement business logic.
  • Implemented Spark Scripts using Scala, Spark SQL to access hive tables into spark for faster processing of data.
  • Used Spark API over Horton works Hadoop YARN to perform analytics on data in Hive.
  • Experience on implementation of a log producer in Scala that watches for application logs, transform incremental log and sends them to a Kafka and Zookeeper based log collection platform.
  • Experienced with Spark Context, Spark-SQL, Data Frame, Pair RDD's, Spark YARN.
  • Import the data from different sources like HDFS/HBase into Spark RDD.
  • Developing traits and case classes etc in scala.

Environment: Hive, Map Reduce, Oozie, Flume and Sqoop, SQL Server, Shell Scripting

Confidential, Plano, TX

Hadoop Developer

Responsibilities:

  • Responsible for designing and implementing ETL process to load data from different sources, perform data mining and analyze data using visualization/reporting tools to leverage the performance of System.
  • Collected the logs from the physical machines and integrated into HDFS using Flume.
  • Developed custom MapReduce programs to extract the required data from the logs.
  • Performed performance tuning and troubleshooting of MapReduce jobs by analyzing and reviewing Hadoop log files.
  • Automating and scheduling the Sqoop jobs in a timely manner using Unix Shell Scripts.
  • Developed Map Reduce programs to parse the raw data, populate staging tables and store the refined data in partitioned tables in the EDW.
  • Used Oozie workflow engine to manage interdependent Hadoop jobs and to automate several types of Hadoop jobs such as Java map-reduce Hive, Pig, and Sqoop.
  • Imported data using Sqoop to load data from Mysql to HDFS on regular basis.
  • Created Partitions, Buckets based on State to further process using Bucket based Hive joins.
  • Responsible for creating Hive tables, loading the structured data resulted from MapReduce jobs into the tables and writing hive queries to further analyze the logs to identify issues and behavioral patterns.
  • Responsible for loading and transforming large sets of structured, semi structured and unstructured data.
  • Created and maintained Technical documentation for launching HADOOP Clusters and for executing Hive queries and Pig Scripts.

Environment: JDK 1.5, Hadoop, HDFS, Pig, Hive, MapReduce, HBase, Sqoop, Oozie.

Confidential

Sr. SQL Server Developer/ DBA

Responsibilities:

  • Installation and configuration/Maintenance of SQL Server 2012/2008R2/2008/2005 in all Environments. Configured SQL server 2012 always-on availability groups with one secondary replica on windows server 2012.
  • Configured and maintained Transactional Replication for reporting and fixed replication issues on the servers.
  • Applying SQL server packs and hot fixes (cumulative updates) on SQL servers as per the requirement.
  • Implement side by side Migration of databases from SQL server 2008r2 to SQL server 2012.
  • Worked and involved on database pre-deployment and deployment DBA activities on production servers.
  • Working different High Availability strategies like Log shipping, clustering (Active/passive).
  • Worked on troubleshooting long running queries, blocking, deadlock issues and temp db issues
  • Managing SQL servers by performing DBA tasks like Database Backup/Restore and by creating maintenance plans like Daily backup, check DB and weekly rebuild and scheduling jobs on daily/weekly basis. Monitoring Event Viewer, SQL Error logs for errors.
  • Monitoring all the SQL server alerts like disk space issues and SQL job failures and fixing the issues.
  • Configured Report Server Using Report configuration manager in SQL server 2008R2 and involved in deploying, delivering of reports using SSRS and troubleshoot for any errors occurs in execution.
  • Involved in Capacity Planning and adding new storage space as needed.
  • Involved in Monitoring Performance of different production servers using native SQL tools and windows tools like Performance monitor and SQL Profiler to optimize queries and enhance the performance of database servers.
  • Responsible for monitoring and making recommendations for performance improvement in databases. This involved index creation, index removal, index modification, and adding scheduled jobs to re-index and update statistics in databases.
  • Database Security Maintenance & Creating Linked servers.
  • Updating the SQL accounts & service account passwords prior to expiry on all the servers as per the audit requirement.
  • Documenting all the issues encountered and all the important activities accomplished in the environment.
  • Used Remedy ticketing tool in Resolving database related incidents and requests, resolving incidents within defined SLA's and respond to database related alerts escalations.

Environment: SQL Server 2005/2008/2008 R2/2012,T-SQL, SSIS, SSAS, SSRS, SAN and Red Gate, Lite Speed, ASM, MS clustering, Crystal Reports, Citrix,MS-Excel, XML, Replication, Windows 2008R2/2008/2003/NT/XP.

Confidential

SQL Server Developer

Responsibilities:

  • Involved in Creation of tables, indexes, sequences, constraints and created stored procedures and triggers which were used to implement business rules.
  • Installation of SQL Server on Development and Production Servers, setting up databases, users, roles and permissions.
  • Extensively involved in SQL joins, sub queries, tracing and performance tuning for better running of queries
  • Provided documentation about database/data warehouse structures and Updated functional specification and technical design documents.
  • Designed and created different ETL packages using SSIS and transfer data from heterogeneous database different files format Oracle, SQL Server, and Flat File to SQL server destination.
  • Worked on several transformations in Data Flow including Derived column, Slowly Changing Dimension Using SSIS Controls, Lookup, Fuzzy Lookup, Data Conversion, Conditional split and many more.
  • Created various reports with drilldowns, drill through, calculated members, and drilldowns reports by using SQL Server Reporting Services
  • Used various report items like tables, sub report and charts to develop the reports in SSRS and upload into Report Manager
  • Created complex Stored Procedures, Triggers, Functions, Indexes, Tables, Views, SQL joins and other T-SQL code to implement business rules
  • Used Performance Monitor and SQL Profiler to optimize queries and enhance the performance of database servers.

Environment: MS SQL Server 2008R2/2008,T-SQL, SQL Server Reporting Services (SSRS), SSIS, SSAS, Business Intelligence Development Studio (BIDS), MS Excel, Visual Source Team Foundation Server, VB Script

We'd love your feedback!