We provide IT Staff Augmentation Services!

Sr. Big Data Developer Resume

5.00/5 (Submit Your Rating)

NJ

SUMMARY

  • Around 5+ years of IT experience in design and development using Big Data Ecosystem (Sqoop, Hive, Impala, Flume, Spark, Scala) and Prescriptive analytics using Python, R
  • Experience in working in environments using Agile (SCRUM), RUP and Test - Driven development methodologies.
  • Good Knowledge in Amazon AWS concepts like EMR, Kinesis, S3 which provides fast and efficient processing
  • Experience in different Hadoop distributions like Cloudera CDH5 and Horton Works.
  • Good understanding of NoSQL Database and hands on work experience in writing application on No SQL database which is MongoDB.
  • Experience in importing and exporting data using Sqoop from HDFS to RDBMS
  • Created UDFs in Hive and implemented data lake architecture to consume transactional data.
  • Created Data Quality check scripts followed by Spark and Hive jobs.
  • Implemented ETL operations in moving legacy data to Hadoop Data Lake
  • Created Stream set pipelines and developed Control-M Jobs.
  • Strong experience in analyzing large amounts of data sets writing PySpark scripts and HiveQL, Spark-SQL and use Spark-core to write Scala transformations.
  • Worked on various file formats such as Avro, Parquet and JSON.
  • Involve in moving all log files generated from various sources to HDFS and Spark for further processing.
  • Implemented a data migration from MySQL to NoSQL(Cassandra) using SPARK
  • Code check in and version controlling using GIT and BitBucket.
  • Highly skilled in Data analytics and statistical computing using Python, R, SAS, Excel

TECHNICAL SKILLS

  • Sqoop, Stream sets
  • HDFS, HBase
  • Map reduce, Hive, Spark, Impala
  • Control-m
  • Hue, Ambari
  • Zookeeper
  • MySQL, DB2/ Cassandra, HBase
  • Linux- Ubuntu, Windows
  • Python, R, Scala, SAS
  • Cloud era, Hontonworks

PROFESSIONAL EXPERIENCE

Sr. Big Data Developer

Confidential, NJ

Responsibilities:

  • Developed Spark jobs to optimize and process the business data.
  • Served as Onsite-lead to handle critical delivery of PROD data.
  • Managed SDLC Process with non-data team for design reviews and implementation plan approvals
  • Understanding root cause of the PROD failure and fix.
  • Developing the SQL Joins for critical data engaging BSA.
  • Developed Scala code that involves SQL joins for data release in optimized way.
  • Creating Entity file out of the spark scala code in JSON format.
  • Preparing crosswalk file that contains the rules for the data delivery.
  • Implementing Data Lake on HBASE and Hive to consume the External data.
  • Created Hive Tables, loaded claims data from Oracle using Sqoop and loaded the processed data into target database.
  • Created Hive external tables and load data from JSON and Text file into ORC format.
  • Explored with Spark to improve the performance and optimization of the existing algorithms in Hadoop using Spark context, Spark-SQL, Data Frame, pair RDD.
  • Loaded the data into Spark RDD and Perform in-memory data computation to generate the output as per the requirements.
  • Performed data analytics in Hive and then exported those metrics back to Oracle Database using Sqoop
  • Exported data from HDFS to MySQL via Sqoop for Business Intelligence, visualization and user report generation.
  • Optimizing Join conditions for efficient data load in Data Lake.
  • Handle and clear defects from QA team.

Environment: Hadoop, Spark, YARN, Scala, MapReduce, Hive, HDFS, ORC, JSON, BitBucket, ETL, Flume, Tableau, GIT

Big Data/Hadoop Developer

Confidential, NJ

Responsibilities:

  • Created Spark and Hive jobs to optimize and process the business data.
  • Implemented ETL operations that provide a centralized data access in a standard way.
  • Developed UNIX scripts to extract data from structured, semi-structured and unstructured data files to load into HDFS and HIVE.
  • Moved legacy data from various external source systems to Hadoop Data Lake.
  • Created Streamset pipelines to control drift in data transfer from source system to Data Lake.
  • Developed Workload Automation using Control-M.
  • Developed PySpark code to replace existing SQL Stored Procedures.
  • Created Data Quality check scripts.
  • Worked on various file formats such as Avro, Parquet and JSON.
  • Implemented Kafka consumer for real time data transfer.
  • Created re-usable modules in pyspark.
  • Code check in and version controlling using BitBucket.
  • Participated in daily scrum meetings and worked in story-driven agile development environment.

Environment: Hadoop, Spark, YARN, PySpark, Python, MapReduce, Hive, HBase, Kafka, Shell Script, Control-M, StreamSets, HDFS, Sqoop, Avro, Parquet, JSON, BitBucket.

Spark Developer

Confidential

Responsibilities:

  • Wrote Programs in Spark using Scala and Python for Data quality check.
  • Worked on Big Data infrastructure for batch processing and real time processing. Built scalable distributed data solutions using Hadoop.
  • Written transformations and actions on data frames used Spark SQL on data frames to access hive tables into spark for faster processing of data.
  • Imported and exported terabytes of data using Sqoop and real time data using Flume.
  • Created various hive external tables, staging tables and joined the tables as per the requirement.
  • Implemented static Partitioning, Dynamic partitioning and Bucketing in Hive using internal and external table.
  • Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, Python and Scala.
  • Used Hive to do transformations, joins, filter and some pre-aggregations after storing the data to HDFS
  • Implemented usage of Amazon EMR for processing Big Data across a Hadoop Cluster of virtual servers on Amazon Elastic Compute Cloud (EC2)
  • Designed ETL workflows on Tableau, Deployed data from various sources to HDFS and generated reports using Tableau.

Environment: Hadoop 2.8, MapReduce, HDFS, Yarn, Hive 2.1, Sqoop 1.1, Cassandra 2.7, Oozie, Spark, Scala, Python, AWS, Flume 1.4, Kafka, Tableau, Linux, Shell Scripting.

Data Analyst

Confidential

Responsibilities:

  • Helped in setting up a Data Integration Layer and Performed Data Profiling which consolidated data from multiple data sources ensuring data consistency by handling basic data cleansing and standardization of data elements
  • Developed story telling dashboards in Tableau Desktop and published them on to Tableau Server which allowed end users to understand the data on the fly with the usage of quick filters for on demand needed information
  • Worked with Business Leads, Solutions Architect, and other Scrum Team members in an agile process to deliver high quality and efficient Tableau dashboards and ad-hoc reporting capabilities.
  • Interacted with business users for gathering user requirements, look and feel of the dashboards and various stakeholders of the project.
  • Built user-friendly visualizations using most of the Tableau Desktop features including actions, parameters, LOD calculations, filters and various others.
  • Optimized Tableau dashboards with a focus on usability, performance, flexibility, testability, and standardization
  • Created repeatable processes for data extraction, enhancement and analysis activities in Tableau
  • Worked with Tableau command line tools tabc and tabadmin for managing scheduled tasks, administering users and groups, troubleshooting, performance tuning, and general systems maintenance.
  • Wrote numerous views using SQL to support reporting needs.
  • Perspectives with each other, resolve any issues and come to an agreement quickly.

Environment: Tableau Desktop 9.0/9.2, Tableau Server 9.0, SQL Server 2008 R2/2012, ASP, HTML, Windows, Microsoft Office suite

We'd love your feedback!