We provide IT Staff Augmentation Services!

Big Data Developer/ Engineer Resume

5.00/5 (Submit Your Rating)

Phoenix, AZ

SUMMARY

  • Accomplished Software Engineer with around 7+ years of experience in Software Development Life Cycle, Project Management, Design, Development, Testing, Training, and client services using several programming languages including Hive, Java, JavaScript, SQL, Python with Agile methodology.
  • Extensive experience in installing, configuring, and using ecosystem components like Hadoop MapReduce, HDFS, Sqoop, Parquet, Kafka, Snowflake Computing, Hive, Impala & Spark.
  • Good understanding of NoSQL databases like HBase and DynamoDB.
  • Experience in understanding the Requirements, Data Analysis, Data Quality, Data Mapping, Testing, and Deployment of business applications in highly scalable environments.
  • Strong experience in implementing Data warehousing applications using ETL/ELT tools Informatica, Snowflake.
  • Good experience in creating procedures, packages, functions, triggers views, tables, indexes, cursors, SQL collections, and optimizing query performance and other database objects using SQL and have good knowledge in writing SQL queries.
  • Hands - on experience on AWS cloud services (Lambda, EC2, S3, RDS, Redshift, Data Pipeline, EMR, Sage Maker, Glue).
  • Build robust automated ingestion pipelines using Python, Scala, Spark, Hive, Hadoop, Sqoop, Parquet, HDFS, Kafka to help enable the business to access all our data sets in Data Lake and Data Warehouse.
  • Worked on data ingestion ETL Pipeline on streaming and batch data from different data sources to ingest data into Data Lake using AWS Glue, AWS Lambda, AWS Kinesis, Firehose, S3.
  • Worked on production support for catching up data and seamlessly migrating the Code from Development to Testing, UAT, and Production.
  • Problem-solving mindset working in agile methodology.

TECHNICAL SKILLS

Big Data Technologies: Spark, Hive, Kafka, Sqoop, HDFS, MapReduce, Pig, GraphQL, Elastic search

Programming Language: SQL, JavaScript, Java, Hive (HQL), Python, Scala

SQL/NoSQL Databases: SQL Server, MySQL, Oracle, HBase, DB2

Cloud Technology: Glue, Lambda, Athena, EC2, RDS, S3, CloudWatch, SNS, SQS, EMR, Kinesis

DevOps: Jenkins, Terraform, Docker

Other Tools: Jira, Putty, WINSCP, EDI(Gentran), Stream weaver

PROFESSIONAL EXPERIENCE

Confidential, Phoenix, AZ

Big Data Developer/ Engineer

Responsibilities:

  • Worked on requirement gathering, analysis, and translate business requirements into technical design with Hadoop Ecosystem.
  • Operationalize machine learning solutions that are deployed on Scalable Infrastructures like Kubernetes, Docker, and Serverless Amazon Cloud applications
  • Provide batch processing solution to a certain unstructured and large volume of data by using the Hadoop MapReduce framework.
  • Installed and worked on Apache to monitor Hadoop jobs.
  • Built robust automated ingestion pipelines using Python, Scala, Spark, Hive, Hadoop, Sqoop, Parquet, HDFS, Kafka to help enable the business to access all our data sets in Data lake and Data Warehouse.
  • Created UDF in spark using java according to the business requirements and load the data into Database.
  • Involved in implementing complex MapReduce programs to perform joins on the Map side using distributed cache in Java.
  • Performed the migration of large data sets to Databricks (Spark), create and administer cluster, load data, configure data pipelines, loading data to Databricks using AF pipelines
  • Wrote Java code for file writing and reading, extensive usage of data structure Array List and Hash Map.
  • Created various Sqoop commands which import data from Oracle source into Hadoop Distributed file system.
  • Created Databrick notebooks to streamline data for various business use cases and also mounted blob storage on Databrick
  • Participated inRapid Application DevelopmentandAgileprocesses to deliver newcloud platformservices.
  • Hands-on experience on AWS cloud services (Lambda, EC2, S3, RDS, Redshift, Data Pipeline, EMR, Sage Maker, Glue).
  • Worked on data ingestion ETL Pipeline on streaming and batch data from different data sources to ingest data into Data Lake using AWS Glue, AWS Lambda, AWS Kinesis, Kafka, Firehose, S3.
  • Design and implement automated data pipelines, Data structures, Algorithms, APIs, data quality checks, CI/CD design, and SQL interfaces.
  • Developed Terraform script to create and manage AWS Services.
  • Used MapReduce programming to format the data for analysis.
  • Created hive UDFs and used them to achieve the proper output.
  • Used Oozie workflows to schedule the Hadoop jobs.
  • Involved in story-driven agile development methodology and actively participated in daily scrum meetings.

Confidential, Hartford, CT

Hadoop Developer

Responsibilities:

  • Responsible for building scalable distributed data solutions using Hadoop. Worked with ETL process using Pig.
  • Worked on data analysis in HDFS using MapReduce, Hive, and Pig jobs.
  • Worked on MapReduce programming and HBase.
  • Involved in creating an external table, partitioning, bucketing of the table.
  • Ensuring adherence to guidelines and standards in the project process.
  • Facilitating testing in different dimensions.
  • Hands-on experience on AWS cloud services (Lambda, EC2, S3, RDS, Redshift, Data Pipeline, EMR, Sage Maker, Glue).
  • Used Crontab for automation of scripts.
  • Used Kafka to deploy projects.
  • Wrote and modified stored procedures to load and modifying of data according to business rule changes.
  • Wrote Java code for file writing and reading, extensive usage of data structure Array List and Hash Map.
  • Extracted the data from Teradata into HDFS using Sqoop.
  • Continuous monitoring and managing the Hadoop cluster through Cloudera Manager.
  • Developed Hive queries to process the data and generate the data cubes for visualizing.
  • Worked on importing data from various sources and performed transformations using MapReduce, Pig to load data into HDFS.
  • Responsible for estimating the cluster size, monitoring, and troubleshooting of the Spark Databricks cluster.
  • Developed and maintained Big Data streaming and batch applications using Storm.
  • Configured Sqoop jobs to import data from RDBMS into HDFS using Oozie workflows.
  • Created HBase tables to store variable data formats coming from different portfolios.
  • Worked with QA and DevOps teams to troubleshoot any issues that may arise during production

Confidential

Hadoop Developer

Responsibilities:

  • Developed Big Data Solutions that enabled the business and technology teams to make data-driven decisions on the best ways to acquire customers and provide them business solutions.
  • Developed various MapReduce programs to cleanse the data and make them consumable by Hadoop.
  • Migrated the existing data to Hadoop from RDBMS (Oracle) using Sqoop for processing the data.
  • Worked with Sqoop export to export the data back to RDBMS.
  • Used various compression codecs to effectively compress the data in HDFS.
  • Created hive internal and external tables with appropriate static and dynamic partitions for efficiency.
  • Used Parquet for serialization and de-serialization and, also implemented hive custom UDF's involving date functions.
  • Worked on a POC to benchmark the efficiency of Avro vs Parquet.
  • Implemented the end-to-end workflow for extraction, processing, and analysis of data using Oozie.
  • Used various optimization techniques to optimize hive, Pig, and Sqoop.

Confidential

Java Developer

Responsibilities:

  • Analyzed and prioritized user and business requirements as system requirements that must be included while developing the software.
  • Designed and implemented Stored Procedures while working with Oracle SQL server.
  • Created database designs and proper database maintenance to maximize database availability and high performance. Actively monitor the data for its accuracy and capacity to take corrective actions when needed.
  • Created an entity object (business rules and policy, validation logic, default value logic, security)
  • Creating Modules Using Task Flow with Bounded and Unbounded.
  • Handle the AJAX functions (partial trigger, partial Submit, auto Submit).
  • Implement the Database using the Oracle database engine.

We'd love your feedback!