We provide IT Staff Augmentation Services!

Big Data Developer Resume

Pleasanton, CA


  • Overall 8 years of IT experience in a variety of industries, which includes 3+ years of work experience in Big data Analytics and development with good knowledge on Hadoop Framework, Hadoop and parallel processing implementation.
  • Experience in Hadoop Ecosystems HDFS, Map Reduce, Hive, Pig, Sqoop, YARN and AWS
  • Proficient experience in all phases of software Engineering including Analysis, Design, Coding, Testing and Implementation as well as Agile Methodologies.
  • Experience in Cloudera and Hortonworks Distributions.
  • Highly experienced in importing and exporting data between HDFS and Relational Database Management systems using Sqoop.
  • Experience in developing Spark Applications using Spark RDD, Spark - SQL and Data frame APIs.
  • Proficient in developing data transformation and other analytical applications in Spark, Spark-SQL using Python programming language (PySpark).
  • Familiar working on various formats of files like delimited text files, click stream log files, Apache log files, Avro files, JSON files, XML Files.
  • Good Knowledge of OOPS concepts and Design patterns.
  • Involved in preparing ETL mapping specification documents and Transformation rules for the mapping.
  • Good understanding of dimensional modeling (Star schema and Snowflake schema, SCD types- 1, 2, 3), and data modeling (star schema) at logical and physical level.
  • Good Knowledge about scalable, secure cloud architecture based on Amazon Web Services (leveraging AWS EMR Clusters, EC2, S3, etc.
  • Hands on Experience on Spark streaming connecting to Kafka cluster.
  • Architected, Designed and maintained high performing ETL Process.
  • Hands on experience in using automatic build tools MAVEN and Jenkins.
  • Worked on version control tools like Bit-Bucket, GIT, SVN.
  • Experienced with different scripting language like Python and shell scripts.
  • Good experience with SQL, PL/SQL and database concepts
  • Exceptional ability to learn new technologies and to deliver outputs in short deadlines.
  • Team player with good interpersonal and problem-solving skills, ability to work in team and work independently.


Big Data Ecosystems: Hadoop, Map Reduce, Spark, HDFS, HBase, Pig, Hive, Sqoop, Kafka, Hue, Cloudera, Horton works, oozie and airflow.

Spark Technologies: Spark SQL, Spark Data frames and RDD

Scripting Languages: Python and shell scripting

Programming Languages: Java, python, SQL, PL/SQL

Cloud Technologies: AWS EMR, EC2 and s3

Databases: Oracle 12c, MySQL and Microsoft SQL Server

NoSQL Technologies: HBase

BI tools: Tableau, Kibana

Web Technologies: HTML, CSS, XML, SOAP, and REST.

Development Tools: Eclipse, PyCharm, Git, ANT, Maven, Jenkins, Bamboo, SOAP UI, QC, Jira, Bugzilla

Methodologies: Agile /Scrum, Waterfall

Operating Systems: Windows X/7/8/10, UNIX, LINUX.


Confidential, Pleasanton, CA

Big data Developer


  • Actively worked with business client SME’s to gather requirements for project planning and development.
  • Coordinate with Landing Zone SME’s while performing the SDLC Phases.
  • Collaborate with source team to introspect the details by referring the technical specification document.
  • Involved in Designing and developing ingestion and refinement frameworks.
  • Involved in Designing the incremental approach based on the specific requirement of the use case.
  • Ingesting the data from various sources like ORACLE, DB2 on to HDFS raw zone using the Sqoop as part of ingest framework.
  • Optimized the Sqoop ingestion jobs by analyzing mappers, and degree of parallelism.
  • Performing data cleansing and Data manipulations over the Data received in the form of Flat Files.
  • Handled and ingested data in the form of Flat Files by using SFTP, BCP operations.
  • Perform Data Ingestion in different file formats like Avro, parquet and ORC.
  • Have Integrated the scripts to address various databases and its ingestion process.
  • Develop Test cases and validated the ingested files on HDFS raw zone.
  • Fine tuning Hive queries for better performance outcomes.
  • Develop Hive scripts using Spark SQL to de-normalize and aggregate the data on refinement jobs.
  • Performed the Data manipulation using the Spark Data frames.
  • Optimized the Hive queries in refinement by considering the partition and bucketing when and where required.
  • Handled the skewness in hive view query by using random function and reduced the execution time from 6 hrs. to 1 hr.
  • Optimized the Spark-SQL jobs by repartitioning the data and tuning the memory parameters.
  • Developed spark udf’s using python to calculate percentage calculations.
  • Reduced the hive query execution time from 5 hrs. to 1.5 hours by performing operations on partition table.
  • Analyze and fix production issues raised by business team.
  • Provide Production support for use case that are live in PROD Environment
  • Used IBM Tivoli Work Scheduler and crontab to execute the data workflows.
  • Experience working with GIT versioning and Jenkins to build projects.

Confidential, San Jose, CA

Big Data Engineer


  • Developed multiple jobs in Pig for data cleaning and processing.
  • Implemented the custom Incremental logic in loading the payload data using PIG.
  • Performed various Joins on staging tables in handling the insert and update records.
  • Used Avro file format in Pig Latin to load and Store data.
  • Developed Pig scripts to convert the data from Avro to Text file format.
  • Create the Hive external tables for analytical querying on the data present in HDFS.
  • Worked on partitioning the HIVE table and running the scripts in parallel to reduce the run time of the scripts.
  • Developed workflow in Oozie to automate the tasks of loading the data into HDFS and pre-processing with Pig.
  • Created custom scripts in validating the data present in HDFS.
  • Monitor and Analyze the yarn application logs.

Confidential, St Louis, MO

Big Data Engineer


  • Developed data pipeline using Spark, Hive, on amazon EMR clusters.
  • Developed Spark code in pulling the data present in S3 buckets for faster data processing.
  • Ingested data from S3 and analyzed the data using Spark (Data frames and Spark-SQL), and series of Hive scripts to produce summarized results to downstream systems.
  • Worked on Airflow in creating the workflow orchestration of entire data pipeline.
  • Created the required airflow Dags by using the various airflow operators necessary in orchestrating the workflow.
  • Developed python application in pulling the data from project related Rest API’s.
  • Dashboarding using Kibana over the down streamed data present in S3 buckets. (Kibana on AWS).
  • ­Implemented installation and configuration of multi-node cluster on the cloud using Amazon Web Services (AWS) ln EC2.
  • Developed a File watcher application to deal with data ingestion over the data present in S3.


Hadoop Developer


  • Developed Confidential MFP analytics reporting application using HIVE.
  • Developed Hadoop based solution for customer-care calls prediction.
  • Developed shell scripts for cleaning, validating and transforming the data.
  • Implemented Hive Generic UDF’s to in corporate business logic into Hive Queries.
  • Handled importing data from different data sources into HDFS using Sqoop and also performing transformations using Hive and then loading data into HDFS.
  • Exported the analyzed data to the relational databases using Sqoop, to the downstream data on to DataMart’s
  • Written Hive jobs to parse the logs and structure them in tabular format to facilitate effective querying on the log data.
  • Scheduled data extracts on a daily basis.
  • Monitored workload, job performance and capacity planning using Cloudera Manager.


BI Developer


  • Participated in requirement gathering and converting the requirements into technical specifications.
  • Developed SQL queries and Stored Procedures using PL/SQL to retrieve and insert into multiple database schemas.
  • Created views to facilitate easy user interface implementation, and triggers on them to facilitate consistent data entry into the database.
  • Involved in performance tuning of T-SQL queries.
  • Data migration (import & export/BCP) from Text to SQL Server.
  • Generating ad-hoc reports using MS-Excel and Crystal Reports.
  • Involved in Design, analysis, Implementation, Testing and support of ETL processes for Stage, ODS and Mart.
  • Test the Web services and WSDL using the SOAP UI.

Hire Now