We provide IT Staff Augmentation Services!

Senior Big Data Developer Resume

3.00/5 (Submit Your Rating)

Louisville, KY

PROFESSIONAL SUMMARY:

  • 8 years of total IT experience which includes Java Application Development, Database Management & on Big Data technologies using Hadoop Ecosystem
  • 6 years of experience in Big Data Analytics using various Hadoop eco - system tools and Spark Framework.
  • Solid understanding of Distributed Systems Architecture, MapReduce and Spark execution frameworks for large scale parallel processing.
  • Worked extensively on Hadoop eco-system components Map Reduce, Pig, Hive, HBase, Flume, Sqoop, Hue, Oozie, Spark and Kafka.
  • Experience working with all major Hadoop distributions like Cloudera (CDH), Horton works(HDP) and AWS EMR.
  • Developed highly scalable Spark applications using Spark Core, Data frames, Spark-SQL and Spark Streaming API's in Scala.
  • Gained good experience troubleshooting and fine-tuning Spark Applications.
  • Experience in working with D-Streams in Streaming , Accumulators , Broadcast variables , various levels of caching and optimization techniques in Spark.
  • Worked on real time data integration using Kafka, Spark streaming and HBase.
  • In-depth understanding of NoSQL databases such as HBase and its Integration with Hadoop cluster.
  • Strong working experience in extracting, wrangling, ingestion, processing, storing, querying and analyzing structured, semi-structured and unstructured data.
  • Solid understanding of Hadoop MRV1 and Hadoop MRV2 (or) YARN Architecture.
  • Developed, deployed and supported several Map Reduce applications in Java to handle semi and unstructured data.
  • Sound Knowledge in Map side join, Reducer side join, Shuffle & Sort, Distributed Cache, Compression techniques, Multiple Hadoop Input & output formats. To achieve Continuous Delivery goal on high scalable environment, used Docker coupled with load-balancing tool Nginx.
  • Solid experience in working with csv, text, sequential, Avro, parquet, orc, Jason formats of data.
  • Expertise in working with Hive data warehouse tool - creating tables, data distribution by implementing static and dynamic partitioning, bucketing and optimizing the Hive QL queries.
  • Involved in ingestion of structured data from SQL Server, My Sql, Tera data to HDFS and Hive using Sqoop. Experience in writing AD-hoc Queries in Hive and analyzing data using HiveQL.
  • Extensive experience in performing ETL on structured, semi-structured data using Pig Latin Scripts.
  • Expertise in moving structured schema data between Pig and Hive using H Catalog.
  • Proficient in creating Hive DDL’s and Hive UDF’s. Designed and implemented Hive and Pig UDF's using Python, java for evaluation, filtering, loading and storing of data.
  • Experience in migrating the data using Sqoop from HDFS and Hive to Relational Database System and vice-versa according to client's requirement.
  • Virtualized the servers using Docker for the test environments and dev-environments needs, also configuration automation using Docker containers.
  • Experienced in working with Amazon Web Services (AWS) using EC2 for computing and S3 as storage mechanism. Have awareness about Kerberos.
  • Experienced in job workflow scheduling and monitoring tools like Oozie.
  • Proficient knowledge and hands on experience in writing shell scripts in Linux.
  • Developed core modules in large cross-platform applications using JAVA , JSP , Servlets , Hibernate , RESTful , JDBC , JavaScript , XML , and HTML .
  • Experience in creating Docker Containers leveraging existing Linux Containers and AMI's in addition to creating Docker Containers from scratch.
  • Extensive experience in developing and deploying applications using Web Logic , Apache Tomcat and JBOSS . Worked on Podium and Talend.
  • Development experience with RDBMS, including writing SQL queries, views, stored procedure, triggers, Data lake etc.
  • Strong understanding of Software Development Lifecycle (SDLC) and various methodologies (Waterfall, Agile).

TECHNICAL SKILLS:

Big Data Technologies: HDFS, MapReduce, Hive, Pig, Sqoop, Flume, Oozie, Hue, Ambari, Zookeeper, Kafka, Apache Spark, Spark Streaming, Impala, HBase, Flume

Hadoop Distributions: Cloudera, Horton Works, Apache, AWS EMR, Docker, Databricks

Languages: C, Java, PL/SQL, Python, Pig Latin, Hive QL, Scala, Regular Expressions

IDE & Build Tools, Design: Eclipse, NetBeans, IntelliJ, JIRA, Microsoft Visio, PyCharm

Web Technologies: HTML, CSS, JavaScript, XML, JSP, RESTful, SOAP

Operating Systems: Windows (XP,7,8,10), UNIX, LINUX, Ubuntu, CentOS

Reporting Tools: Tableau, Docker, Power view for Microsoft Excel, Talend, Micro Strategy

Databases: Oracle, SQL Server, MySQL, MS Access, NoSQL Database (HBase, Cassandra, MongoDB), Teradata, IBM DB2

Build Automation tools: SBT, Ant, Maven

PROFESSIONAL EXPERIENCE:

Confidential, Louisville, KY

Senior Big Data Developer

Responsibilities:

  • Strong understanding and extensive experience of developing PySpark and Spark-Scala applications and able to lead the development and maintenance independently
  • Working knowledge in Apache Spark and Spark streaming with Kafka as an Input source
  • Experience in configuring Kafka brokers, consumers and producers for optimal performance.
  • Good understanding of the internals of Kafka design, message compression and replication.
  • Experience in integrating Kafka with other tools for logging and packaging.
  • Worked closely with the Meijer data and gained in-depth understanding and knowledge of retail domain.
  • Created private cloud using Kubernetes that supports DEV, TEST, and PROD environments.
  • Successfully led the development, deployment and maintenance of the Meijer’s House holding process.
  • In-depth knowledge and hands-on experience in dealing with Apache Hadoop components like HDFS, HiveQL, Pig, Hive and Sqoop.
  • Experience using Amazon Web Services including Kinesis, Lambda, SQS/SNS, S3, RDS
  • Experience in reading from and writing data to Amazon S3 in Spark Applications.
  • Experience in selecting and configuring the right Amazon EC2 instances and access key AWS services using client tools.
  • Building/Maintaining Docker container clusters managed by Kubernetes Linux, Bash, GIT, Docker, on GCP
  • Worked with Open Shift platform in managing Docker containers and Kubernetes Clusters
  • Knowledge on using AWS identity and Access Management to secure access to EC2 instances and configure auto-scaling groups using Cloud Watch.
  • Capable of processing large sets of Structured and semi-structured data. Processed Text and Avro file format data and performing transformations using spark RDD/Data frames
  • Perform Hive querying on an ad-hoc basis to resolve customer queries and to generate reports as required.
  • Used CI/CD tools Jenkins, Git/Gitlabs, Jira and Docker registry/daemon for configuration management and automation using Ansible
  • Processed unstructured log files through spark-python to generate statistics and loading the details into Hive Tables.
  • Experience in executing Hive queries using Hue, Impala and Hive shell. In-depth experience in writing complex Hive queries using Joins, nested queries, aggregations and window functions.
  • Created Hive tables, dynamic partitions, buckets, and working on them using HiveQL
  • In-depth experience in automating various PySpark, Hive, Bash and Scala applications using Airflow and Oozie
  • Trained staff on effective use of Jenkins, Docker, GitLab and Kubernetes
  • Worked extensively on Airflow and successfully deployed Airflow DAG of 64 instances which comprises of both parallel and sequential executions
  • Loading data into the HDFS from dynamically generated files, Relational Database Management systems using SQOOP.
  • Migrated Hive projects with multiple hive queries into spark-python transformation/action using RDD and Data frames.
  • Good knowledge on GIT commands, version tagging and pull requests
  • Performed unit testing and also integration testing after the development and participated in code reviews.
  • Interact with business analysts to understand the business requirements and translate them to technical requirements
  • Collaborate with various technical experts, architects and developers for design and implementation of technical requirements

Environment: Airflow, Spark 1.6.0, Spark 2.2, Python 2.7, Python 3, Kubernetes, Scala 2.10.5, Hadoop 2.6.0-cdh5.7.0, cdh5.13.1, cdh1.15.0 Java 1.8.0 92, SparkSQL, R programming, MongoDB, Visual Studio, PyCharm, Apache Hive 1.1.0, HDFS, Oozie, Maven, Docker, IntelliJ, GIT,

Confidential, Framingham, MA

Sr. Application Developer (Spark)

Responsibilities:

  • Development and Review of spark code containing Airflow DAG’s, Data bricks Notebooks, Delta Tables in DDL’s and Metadata SQL’s, other SQL scripts.
  • Deploying the Code to Dev, QA, PreProd and Prod Environments by adhering to GIT process flow and following the standards mentioned by the release management process.
  • Creating Technical Design Documentation and Support/OPS Turnover documentation by following the OPS checklist.
  • Raising Change Request once the code is PreProd.
  • Airflow Orchestration especially configuring the DAG start date and scheduled time and other parameters.
  • Managed local deployments in Kubernetes, creating local cluster and deploying application containers.
  • Container management using Docker by writing Docker files and set up the automated build on Docker HUB and installed and configured Kubernetes
  • Worked on mainly developing Pyspark code in Databricks code using existing load patterns (Full, Incremental and Backfill) for forecasting (Region and Country) rawCustomerSales and pubCustomerSales.
  • Wrote Spark Data frames that uses mainly CSV files, Parquet, Delta file formats. Used Spark SQL, Joins, views, partitioning extensively.
  • Validating the source data and generating the output data in the required format using Pyspark transformations
  • Submitting Jobs for cluster administered by other Linux teams.

Environment: Used Data bricks, Azure Data Lake storage(Gen1), Docker,Oracle EDW, PySpark mainly & Spark SQL, Scala Spark occasionally, Kubernetes, Jenkins, PyCharm, Git, Spark BDA server, Putty for Tunneling into Airflow environments etc.

Confidential, Overland Park, KS.

Hadoop/Kafka Developer

Responsibilities:

  • Responsible for ingesting large volumes of IOT data to Kafka.
  • Developed Micro services with Java using Spring Boot IDE.
  • Worked on identifying present Scripted syntax Jenkins pipeline style and suggested to changing to Declarative style for reducing deployment time.
  • Wrote Kafka producers to stream the data from external rest APIs to Kafka topics.
  • Experience working for Security groups in AWS cloud and working with S3.
  • Good experience with continuous Integration of application using Jenkins.
  • Used chef, terraform as Infrastructure as code (IaaS) for defining Jenkins plugins.
  • Responsible for maintaining inbound rules of a security group(s) and preventing duplication of EC2 instances.
  • Used git and Docker for Build.

Environment: Shell Scripting, Git, AWS EMR, Kafka, AWS S3, AWS EC2, Java, Spring Boot Eclipse IDE, Maven, chef, Jenkins, Terraform, Docker and Infrastructure as a service (IaaS), Cloudera (CDH) .

Confidential, Chicago, IL

Hadoop/Spark Developer

Responsibilities:

  • Responsible for ingesting large volumes of user behavioral data and customer profile data to Analytics Data store.
  • Developed custom multi-threaded Java based ingestion jobs as well as Sqoop jobs for ingesting from FTP servers and data warehouses.
  • Developed many Spark applications for performing data cleansing, event enrichment, data aggregation, de-normalization and data preparation needed for machine learning exercise.
  • Worked on troubleshooting spark application to make them more error tolerant.
  • Worked on fine-tuning spark applications to improve the over-all processing time for the pipelines.
  • Wrote Kafka producers to stream the data from external rest APIs to Kafka topics.
  • Wrote Spark-Streaming applications to consume the data from Kafka topics and write the processed streams to HBase.
  • Experienced in handling large datasets using Spark in Memory capabilities, using broadcasts variables in Spark, effective & efficient joins, transformations and other capabilities.
  • Worked extensively with Sqoop for importing data from Oracle.
  • Experience working for EMR cluster in AWS cloud and working with S3.
  • Involved in creating Hive tables, loading and analyzing data using hive scripts.
  • Implemented Partitioning, Dynamic Partitions, Buckets in HIVE.
  • Good experience with continuous Integration of application using Jenkins.
  • Used Reporting tools like Tableau to connect with Impala for generating daily reports of data.
  • Collaborated with the infrastructure, network, database, application and BA teams to ensure data quality and availability.

Environment: Spark, Hive, S3, Sqoop, Shell Scripting, AWS EMR, Kafka, AWS S3, Map Reduce, Scala, Eclipse, Maven, Cloudera (CDH)

Confidential, Denver, CO

Hadoop/Big Data Developer

Responsibilities:

  • Responsible for developing efficient MapReduce programs for more than 20 years’ worth of claim data to detect and separate fraudulent claims.
  • Developed Map-Reduce programs from scratch of medium to complex.
  • Uploaded and processed more than 30 terabytes of data from various structured and unstructured sources into HDFS using Sqoop and Flume.
  • Played a key-role is setting up a 100 node Hadoop cluster utilizing MapReduce by working closely with the Hadoop Administration team.
  • Worked with the advanced analytics team to design fraud detection algorithms and then developed MapReduce programs to run efficiently the algorithm on the huge datasets.
  • Developed Java programs to perform data scrubbing for unstructured data.
  • Responsible for designing and managing the Sqoop jobs that uploaded the data from Oracle to HDFS and Hive.
  • Creating Hive tables to import large data sets from various relational databases using Sqoop and export the analyzed data back for visualization and report generation by the BI team
  • Used Flume to collect the logs data with error messages across the cluster.
  • Designed and Maintained Oozie workflows to manage the flow of jobs in the cluster.
  • Played a key role in installation and configuration of the various Hadoop ecosystem tools such as, Hive, Pig, and HBase.
  • Successfully loaded files to HDFS from Teradata, and loaded from HDFS to HIVE
  • Experience in using Zookeeper and Oozie for coordinating the cluster and scheduling workflows
  • Developed Oozie workflows and scheduled it to run data/time dependent Hive and Pig jobs
  • Designed and developed Dashboards for Analytical purposes using Tableau.
  • Analyzed the Hadoop log files using Pig scripts to oversee the errors.
  • Actively updated the higher management with daily updates on the progress of project that include the classification levels in the data.

Environment: Spark, Hive, Sqoop, Shell Scripting, Hive, Oracle, Kafka, HBase, Map Reduce, Scala, Eclipse, Maven, TeraData

Confidential

Java/J2ee Developer

Responsibilities:

  • Involved in Analysis, design and development of web applications based on J2EE.
  • Struts framework is used for managing the navigation and page flow.
  • Developed the EJB-Session Bean acts as Facade, will be able to access the business entities through their local home interfaces.
  • Designed the user interface using HTML, CSS, java Script and JQuery
  • Used Log4j to debug and generate new logs for the application.
  • Used JDBC for accessing the data from the Oracle database. Created database tables, stored procedures using PL/SQL in Oracle DB.
  • Validation on Web Forms, for client-side validation as per the requirement.
  • Involved in various phases of Software Development Life Cycle (SDLC) of the application like Requirement gathering, design, development and documentation.
  • The application is designed using J2EE design patterns and technologies based on MVC architecture
  • Involved in designing the user interfaces using HTML, CSS and JavaScript for client-side validation.
  • Developed custom tags, JSTL to support custom User Interfaces.
  • Handled business logic as a Model using the helper classes and Servlets to control the flow of application as controller as server-side validations.
  • Involved in Servlets, Java Bean programming on the server side for the communication between clients and server.
  • Experienced in developing code to convert JSON data to Customize JavaScript objects.
  • Developed Servlets and JSPs based on MVC pattern using Struts framework.
  • Provided support for Production and Implementation Issues. Involved in end-user/client training of the application.
  • Performed Unit Tests on the application to verify and identify various scenarios.
  • Used Eclipse for development, Testing, and Code Review.
  • Involved in the release management process to QA/UAT/Production regions.
  • Used Maven tool for building application EAR for deploying on Web Logic Application servers.
  • Developed of the project in the agile environment.

Environment: J2EE, Java, Eclipse, EJB, Java Beans, JDBC, JSP, Struts, Design Patterns, BEA WebLogic, PL/SQL, DB2, UML, CVS, JUnit, Log4j.

We'd love your feedback!