Senior Big Data Developer Resume Louisville, KY - Hire IT People

PROFESSIONAL SUMMARY:

8 years of total IT experience which includes Java Application Development, Database Management & on Big Data technologies using Hadoop Ecosystem
6 years of experience in Big Data Analytics using various Hadoop eco - system tools and Spark Framework.
Solid understanding of Distributed Systems Architecture, MapReduce and Spark execution frameworks for large scale parallel processing.
Worked extensively on Hadoop eco-system components Map Reduce, Pig, Hive, HBase, Flume, Sqoop, Hue, Oozie, Spark and Kafka.
Experience working with all major Hadoop distributions like Cloudera (CDH), Horton works(HDP) and AWS EMR.
Developed highly scalable Spark applications using Spark Core, Data frames, Spark-SQL and Spark Streaming API's in Scala.
Gained good experience troubleshooting and fine-tuning Spark Applications.
Experience in working with D-Streams in Streaming , Accumulators , Broadcast variables , various levels of caching and optimization techniques in Spark.
Worked on real time data integration using Kafka, Spark streaming and HBase.
In-depth understanding of NoSQL databases such as HBase and its Integration with Hadoop cluster.
Strong working experience in extracting, wrangling, ingestion, processing, storing, querying and analyzing structured, semi-structured and unstructured data.
Solid understanding of Hadoop MRV1 and Hadoop MRV2 (or) YARN Architecture.
Developed, deployed and supported several Map Reduce applications in Java to handle semi and unstructured data.
Sound Knowledge in Map side join, Reducer side join, Shuffle & Sort, Distributed Cache, Compression techniques, Multiple Hadoop Input & output formats. To achieve Continuous Delivery goal on high scalable environment, used Docker coupled with load-balancing tool Nginx.
Solid experience in working with csv, text, sequential, Avro, parquet, orc, Jason formats of data.
Expertise in working with Hive data warehouse tool - creating tables, data distribution by implementing static and dynamic partitioning, bucketing and optimizing the Hive QL queries.
Involved in ingestion of structured data from SQL Server, My Sql, Tera data to HDFS and Hive using Sqoop. Experience in writing AD-hoc Queries in Hive and analyzing data using HiveQL.
Extensive experience in performing ETL on structured, semi-structured data using Pig Latin Scripts.
Expertise in moving structured schema data between Pig and Hive using H Catalog.
Proficient in creating Hive DDL’s and Hive UDF’s. Designed and implemented Hive and Pig UDF's using Python, java for evaluation, filtering, loading and storing of data.
Experience in migrating the data using Sqoop from HDFS and Hive to Relational Database System and vice-versa according to client's requirement.
Virtualized the servers using Docker for the test environments and dev-environments needs, also configuration automation using Docker containers.
Experienced in working with Amazon Web Services (AWS) using EC2 for computing and S3 as storage mechanism. Have awareness about Kerberos.
Experienced in job workflow scheduling and monitoring tools like Oozie.
Proficient knowledge and hands on experience in writing shell scripts in Linux.
Developed core modules in large cross-platform applications using JAVA , JSP , Servlets , Hibernate , RESTful , JDBC , JavaScript , XML , and HTML .
Experience in creating Docker Containers leveraging existing Linux Containers and AMI's in addition to creating Docker Containers from scratch.
Extensive experience in developing and deploying applications using Web Logic , Apache Tomcat and JBOSS . Worked on Podium and Talend.
Development experience with RDBMS, including writing SQL queries, views, stored procedure, triggers, Data lake etc.
Strong understanding of Software Development Lifecycle (SDLC) and various methodologies (Waterfall, Agile).

TECHNICAL SKILLS:

Big Data Technologies: HDFS, MapReduce, Hive, Pig, Sqoop, Flume, Oozie, Hue, Ambari, Zookeeper, Kafka, Apache Spark, Spark Streaming, Impala, HBase, Flume

Hadoop Distributions: Cloudera, Horton Works, Apache, AWS EMR, Docker, Databricks

Languages: C, Java, PL/SQL, Python, Pig Latin, Hive QL, Scala, Regular Expressions

IDE & Build Tools, Design: Eclipse, NetBeans, IntelliJ, JIRA, Microsoft Visio, PyCharm

Web Technologies: HTML, CSS, JavaScript, XML, JSP, RESTful, SOAP

Operating Systems: Windows (XP,7,8,10), UNIX, LINUX, Ubuntu, CentOS

Reporting Tools: Tableau, Docker, Power view for Microsoft Excel, Talend, Micro Strategy

Databases: Oracle, SQL Server, MySQL, MS Access, NoSQL Database (HBase, Cassandra, MongoDB), Teradata, IBM DB2

Build Automation tools: SBT, Ant, Maven

PROFESSIONAL EXPERIENCE:

Confidential, Louisville, KY

Senior Big Data Developer

Responsibilities:

Strong understanding and extensive experience of developing PySpark and Spark-Scala applications and able to lead the development and maintenance independently
Working knowledge in Apache Spark and Spark streaming with Kafka as an Input source
Experience in configuring Kafka brokers, consumers and producers for optimal performance.
Good understanding of the internals of Kafka design, message compression and replication.
Experience in integrating Kafka with other tools for logging and packaging.
Worked closely with the Meijer data and gained in-depth understanding and knowledge of retail domain.
Created private cloud using Kubernetes that supports DEV, TEST, and PROD environments.
Successfully led the development, deployment and maintenance of the Meijer’s House holding process.
In-depth knowledge and hands-on experience in dealing with Apache Hadoop components like HDFS, HiveQL, Pig, Hive and Sqoop.
Experience using Amazon Web Services including Kinesis, Lambda, SQS/SNS, S3, RDS
Experience in reading from and writing data to Amazon S3 in Spark Applications.
Experience in selecting and configuring the right Amazon EC2 instances and access key AWS services using client tools.
Building/Maintaining Docker container clusters managed by Kubernetes Linux, Bash, GIT, Docker, on GCP
Worked with Open Shift platform in managing Docker containers and Kubernetes Clusters
Knowledge on using AWS identity and Access Management to secure access to EC2 instances and configure auto-scaling groups using Cloud Watch.
Capable of processing large sets of Structured and semi-structured data. Processed Text and Avro file format data and performing transformations using spark RDD/Data frames
Perform Hive querying on an ad-hoc basis to resolve customer queries and to generate reports as required.
Used CI/CD tools Jenkins, Git/Gitlabs, Jira and Docker registry/daemon for configuration management and automation using Ansible
Processed unstructured log files through spark-python to generate statistics and loading the details into Hive Tables.
Experience in executing Hive queries using Hue, Impala and Hive shell. In-depth experience in writing complex Hive queries using Joins, nested queries, aggregations and window functions.
Created Hive tables, dynamic partitions, buckets, and working on them using HiveQL
In-depth experience in automating various PySpark, Hive, Bash and Scala applications using Airflow and Oozie
Trained staff on effective use of Jenkins, Docker, GitLab and Kubernetes
Worked extensively on Airflow and successfully deployed Airflow DAG of 64 instances which comprises of both parallel and sequential executions
Loading data into the HDFS from dynamically generated files, Relational Database Management systems using SQOOP.
Migrated Hive projects with multiple hive queries into spark-python transformation/action using RDD and Data frames.
Good knowledge on GIT commands, version tagging and pull requests
Performed unit testing and also integration testing after the development and participated in code reviews.
Interact with business analysts to understand the business requirements and translate them to technical requirements
Collaborate with various technical experts, architects and developers for design and implementation of technical requirements

Environment: Airflow, Spark 1.6.0, Spark 2.2, Python 2.7, Python 3, Kubernetes, Scala 2.10.5, Hadoop 2.6.0-cdh5.7.0, cdh5.13.1, cdh1.15.0 Java 1.8.0 92, SparkSQL, R programming, MongoDB, Visual Studio, PyCharm, Apache Hive 1.1.0, HDFS, Oozie, Maven, Docker, IntelliJ, GIT,

Confidential, Framingham, MA

Sr. Application Developer (Spark)

Responsibilities:

Development and Review of spark code containing Airflow DAG’s, Data bricks Notebooks, Delta Tables in DDL’s and Metadata SQL’s, other SQL scripts.
Deploying the Code to Dev, QA, PreProd and Prod Environments by adhering to GIT process flow and following the standards mentioned by the release management process.
Creating Technical Design Documentation and Support/OPS Turnover documentation by following the OPS checklist.
Raising Change Request once the code is PreProd.
Airflow Orchestration especially configuring the DAG start date and scheduled time and other parameters.
Managed local deployments in Kubernetes, creating local cluster and deploying application containers.
Container management using Docker by writing Docker files and set up the automated build on Docker HUB and installed and configured Kubernetes
Worked on mainly developing Pyspark code in Databricks code using existing load patterns (Full, Incremental and Backfill) for forecasting (Region and Country) rawCustomerSales and pubCustomerSales.
Wrote Spark Data frames that uses mainly CSV files, Parquet, Delta file formats. Used Spark SQL, Joins, views, partitioning extensively.
Validating the source data and generating the output data in the required format using Pyspark transformations
Submitting Jobs for cluster administered by other Linux teams.

Environment: Used Data bricks, Azure Data Lake storage(Gen1), Docker,Oracle EDW, PySpark mainly & Spark SQL, Scala Spark occasionally, Kubernetes, Jenkins, PyCharm, Git, Spark BDA server, Putty for Tunneling into Airflow environments etc.

Confidential, Overland Park, KS.

Hadoop/Kafka Developer

Responsibilities:

Responsible for ingesting large volumes of IOT data to Kafka.
Developed Micro services with Java using Spring Boot IDE.
Worked on identifying present Scripted syntax Jenkins pipeline style and suggested to changing to Declarative style for reducing deployment time.
Wrote Kafka producers to stream the data from external rest APIs to Kafka topics.
Experience working for Security groups in AWS cloud and working with S3.
Good experience with continuous Integration of application using Jenkins.
Used chef, terraform as Infrastructure as code (IaaS) for defining Jenkins plugins.
Responsible for maintaining inbound rules of a security group(s) and preventing duplication of EC2 instances.
Used git and Docker for Build.

Environment: Shell Scripting, Git, AWS EMR, Kafka, AWS S3, AWS EC2, Java, Spring Boot Eclipse IDE, Maven, chef, Jenkins, Terraform, Docker and Infrastructure as a service (IaaS), Cloudera (CDH) .

Confidential, Chicago, IL

Hadoop/Spark Developer

Responsibilities:

Responsible for ingesting large volumes of user behavioral data and customer profile data to Analytics Data store.
Developed custom multi-threaded Java based ingestion jobs as well as Sqoop jobs for ingesting from FTP servers and data warehouses.
Developed many Spark applications for performing data cleansing, event enrichment, data aggregation, de-normalization and data preparation needed for machine learning exercise.
Worked on troubleshooting spark application to make them more error tolerant.
Worked on fine-tuning spark applications to improve the over-all processing time for the pipelines.
Wrote Kafka producers to stream the data from external rest APIs to Kafka topics.
Wrote Spark-Streaming applications to consume the data from Kafka topics and write the processed streams to HBase.
Experienced in handling large datasets using Spark in Memory capabilities, using broadcasts variables in Spark, effective & efficient joins, transformations and other capabilities.
Worked extensively with Sqoop for importing data from Oracle.
Experience working for EMR cluster in AWS cloud and working with S3.
Involved in creating Hive tables, loading and analyzing data using hive scripts.
Implemented Partitioning, Dynamic Partitions, Buckets in HIVE.
Good experience with continuous Integration of application using Jenkins.
Used Reporting tools like Tableau to connect with Impala for generating daily reports of data.
Collaborated with the infrastructure, network, database, application and BA teams to ensure data quality and availability.

Environment: Spark, Hive, S3, Sqoop, Shell Scripting, AWS EMR, Kafka, AWS S3, Map Reduce, Scala, Eclipse, Maven, Cloudera (CDH)

Confidential, Denver, CO

Hadoop/Big Data Developer

Responsibilities:

Responsible for developing efficient MapReduce programs for more than 20 years’ worth of claim data to detect and separate fraudulent claims.
Developed Map-Reduce programs from scratch of medium to complex.
Uploaded and processed more than 30 terabytes of data from various structured and unstructured sources into HDFS using Sqoop and Flume.
Played a key-role is setting up a 100 node Hadoop cluster utilizing MapReduce by working closely with the Hadoop Administration team.
Worked with the advanced analytics team to design fraud detection algorithms and then developed MapReduce programs to run efficiently the algorithm on the huge datasets.
Developed Java programs to perform data scrubbing for unstructured data.
Responsible for designing and managing the Sqoop jobs that uploaded the data from Oracle to HDFS and Hive.
Creating Hive tables to import large data sets from various relational databases using Sqoop and export the analyzed data back for visualization and report generation by the BI team
Used Flume to collect the logs data with error messages across the cluster.
Designed and Maintained Oozie workflows to manage the flow of jobs in the cluster.
Played a key role in installation and configuration of the various Hadoop ecosystem tools such as, Hive, Pig, and HBase.
Successfully loaded files to HDFS from Teradata, and loaded from HDFS to HIVE
Experience in using Zookeeper and Oozie for coordinating the cluster and scheduling workflows
Developed Oozie workflows and scheduled it to run data/time dependent Hive and Pig jobs
Designed and developed Dashboards for Analytical purposes using Tableau.
Analyzed the Hadoop log files using Pig scripts to oversee the errors.
Actively updated the higher management with daily updates on the progress of project that include the classification levels in the data.

Environment: Spark, Hive, Sqoop, Shell Scripting, Hive, Oracle, Kafka, HBase, Map Reduce, Scala, Eclipse, Maven, TeraData

Confidential

Java/J2ee Developer

Responsibilities:

Involved in Analysis, design and development of web applications based on J2EE.
Struts framework is used for managing the navigation and page flow.
Developed the EJB-Session Bean acts as Facade, will be able to access the business entities through their local home interfaces.
Designed the user interface using HTML, CSS, java Script and JQuery
Used Log4j to debug and generate new logs for the application.
Used JDBC for accessing the data from the Oracle database. Created database tables, stored procedures using PL/SQL in Oracle DB.
Validation on Web Forms, for client-side validation as per the requirement.
Involved in various phases of Software Development Life Cycle (SDLC) of the application like Requirement gathering, design, development and documentation.
The application is designed using J2EE design patterns and technologies based on MVC architecture
Involved in designing the user interfaces using HTML, CSS and JavaScript for client-side validation.
Developed custom tags, JSTL to support custom User Interfaces.
Handled business logic as a Model using the helper classes and Servlets to control the flow of application as controller as server-side validations.
Involved in Servlets, Java Bean programming on the server side for the communication between clients and server.
Experienced in developing code to convert JSON data to Customize JavaScript objects.
Developed Servlets and JSPs based on MVC pattern using Struts framework.
Provided support for Production and Implementation Issues. Involved in end-user/client training of the application.
Performed Unit Tests on the application to verify and identify various scenarios.
Used Eclipse for development, Testing, and Code Review.
Involved in the release management process to QA/UAT/Production regions.
Used Maven tool for building application EAR for deploying on Web Logic Application servers.
Developed of the project in the agile environment.

Environment: J2EE, Java, Eclipse, EJB, Java Beans, JDBC, JSP, Struts, Design Patterns, BEA WebLogic, PL/SQL, DB2, UML, CVS, JUnit, Log4j.

We provide IT Staff Augmentation Services!

Senior Big Data Developer Resume

Louisville, KY

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship