Big Data Developer Resume
Tampa -, FL
SUMMARY
- Around 9+ Years of experience wif emphasis on Big Data Technologies, Development, and Design of Java based enterprise applications.
- In depth knowledge of Hadoop Ecosystems and various components such as HDFS, Job Tracker, Name Node, Data Node, MapReduce and Yarn concepts.
- Handling different file formats on Parquet, orc, delimited fixed wif files, Avro, Sequence file, JSON, XML and Flat file.
- Experienced in performance tuning and real time analytics in both relational database and NoSQL database (HBase) in writingHive UDF, Generic UDF'sto in corporate complex business logic intoHive Queries.
- Hands on experience in injecting data from existing relational databases (Oracle, MySQL,BD2 SQL and Teradata) dat provide SQL interfaces using Sqoop.
- Hands on experience in usingHadoopTechnologies such as Map Reduce, HDFS, Hive, Spark, Oozie, Pig, Kafka, NiFi, Impala, Zookeeper.
- Experienced inoptimizing Hive queriesby tuning configuration parameters developed MapReduce jobs to automate transfer the data from HBase.
- Implemented pre - defined operators in spark such as map, flat Map, filter, reduce ByKey, group ByKey, aggregate By Key and combine ByKey etc.
- Experienced in Banking, Financial & E-learning domains in using Hcatalog for Hive and Pig in working on Scala wif Spark.
- Worked wif join patterns and implemented Mapside joins and Reduce side joins using MapReduce depth knowledge of handling large amounts of data utilizing Spark Data Frames/Datasets API and Case Classes.
- Designed HIVE queries & Pig scripts to perform data analysis, data transfer and table design to load data into Hadoop environment.
- Experienced in Apache Spark for implementing advanced procedures like text analytics and processing using the in-memory computing capabilities written in Scala.
- Exposure to Spark, Spark Streaming, Spark MLlib, Scala and Creating the Data Frames handled in Spark wif Scala working on Kafka cluster. Also has experience in working on Spark and Spark streaming.
- Experienced in handling large datasets using Partitions, Spark in Memory capabilities, Broadcasts in Spark, TEMPEffective &efficient Joins, Transformations and other during ingestion process itself.
- Implemented SQOOP for large dataset transfer between Hadoop and RDBMS defined extract-translate-load (ETL) and extract-load-translate (ELT) processes for the Data Lake.
- Good Expertise in Planning, Installing and Configuring Hadoop Cluster based on the business needs in working wif cloud environment like Amazon Web Services (AWS) EC2 and S3.
- Experience in composing shell scripts to dump the shared information fromMySQLservers toHDFS transformed and aggregated data for analysis by implementing workflow management of Sqoop, Hive and Pig scripts
- Experience in performance tuning, monitoring the Hadoop cluster by gathering and analyzing the existing infrastructure usingCloudera manager.
- Designed Datastage jobs to extract, transform and load data from various sources and archive systems like Oracle, Text files, XML, CSV, IBM DB2, MS SQL Server and Teradata into Data Warehouses and Data Marts.
- Handled errors usingExceptionHandlingextensively for the ease of debugging and displaying the error messages in the application.
- Implemented, designed and analyzed the Relational Database (OLTP) and Data Warehousing Systems (OLAP) proficient in writing, implementation and testing of triggers, procedures and functions inPL/SQLandOracle.
- Experienced in Database programming for Data Warehouses (Schemas), proficient in dimensional modeling (Star Schema modelling, and Snowflake modelling).
- Expertise in UNIX shell scripts using K-shell for the automation of processes and scheduling the DataStage jobs using wrappers.
- Design and develop ETL integration patterns using Python on Spark.
- Developed Spark/Scala, Python for regular expression (regex) project in the Hadoop/Hive environment wif Linux/Windows for big data resources.
- Experience in working on version controller tools likeGitHub (GIT), Subversion (SVN) and software builds tools likeApache Maven.
- AWS EC2and Cloud watch services.CI/CDpipeline management through Jenkins. Automation of manual tasks usingShell scripting.
- Experience in various methodologies like Waterfall and Agile.
- Strong experience in migrating other databases to Snowflake.
- Develop framework for converting existing PowerCenter mappings and to PySpark(Python and Spark) Jobs.
- Good experience working on Tableau and Spotfire and enabled the JDBC/ODBC data connectivity from those to Hive tables.
- Designed neat and insightful dashboards in Tableau expert in unit testing, system integration testing, implementation and maintenance of databases jobs.
- Adept in Agile/Scrum methodology and familiar wif SDLC life cycle from requirement analysis to system study, designing, testing, debugging, documentation, and implementation understanding and experience wif Software Development methodologies like Agile and Waterfall.
TECHNICAL SKILLS
Big Data Technologies: Hadoop, HDFS, Hive, Map Reduce, Pig, Sqoop, Flume, Oozie, Hadoop distribution, and HBase, Spark
Programming Languages: Java (5, 6, 7),Python, Scala, C/C++, XML Shell scripting, COBOL
Databases/RDBMS: MySQL, SQL/PL-SQL, MS-SQL Server 2005, Oracle …
Scripting/ Web Languages: JavaScript, HTML5, CSS3, XML, SQL, Shell, XML, J query, AJAX
ETL Tools: Cassandra, HBASE, ELASTIC SEARCH, Alteryx.
Operating Systems: Linux, Windows XP/7/8
Software Life Cycles: SDLC, Waterfall and Agile models
Office Tools: MS-Office, MS-Project and Risk Analysis tools, Visio
Utilities/Tools: Eclipse, Tomcat, Net Beans, JUnit, SQL, SOAP UI, ANT, Maven, Automation and MR-Unit
Cloud Platforms: Amazon EC2
Version Control: Git, Tortoise SVN
Visualization Tools: Tableau.
Servers IBM: Web Sphere, Web Logic, Tomcat, and Red hat Satellite Server
PROFESSIONAL EXPERIENCE
Confidential, Tampa - FL
Big Data Developer
Responsibilities:
- Responsibilities include Management Information System (MIS) enhancements and sustenance on the data lakes & pipe lines for better insight creation from the data facts.
- Ownership of the design and development of Data pipeline jobs from different source systems Assisted in upgrading, configuration, and maintenance of various Hadoop infrastructures like Pig, Hive, and HBase.
- Developed workflow in Oozie to automate the tasks of loading the data into HDFS and pre-processing, analyzing and training the classifier using MapReduce jobs, Pig jobs, and Hive jobs.
- Developed Spark Streaming by consuming static and streaming data from sources like SQL server, EDW &OLTP data stores.
- Developed Spark code using Scala and Spark-SQL/Streaming for faster testing and processing of data extensively worked in Hive UDFs and fine tuning.
- Responsible for loading data from UNIX file systems to HDFS. Installed and configured Hive and written Pig/Hive UDFs.
- Developed end to end data processing pipelines dat begin wif receiving data using distributed messaging systems Kafka through persistence of data into HBase.
- Uploaded and processed terabytes of data from various structured and unstructured sources into HDFS (AWS cloud) using Sqoop and Flume.
- Developed data pipeline using Kafka, Sqoop, Hive and Java MapReduce to ingest customer behavioral data and financial histories into HDFS for analysis.
- Developed SQOOP scripts for importing and exporting data into HDFS and Hive Spark scripts by using Scala shell commands as per the requirement.
- Solved performance issues in Hive and Pig scripts wif understanding of Joins, Group, and aggregation and how does it translate to MapReduce jobs.
- Worked extensively wif importing metadata into Hive and migrated existing tables and applications to work on Hive and AWS cloud.
- Exploring wif Spark improving the performance and optimization of the Existing algorithms in Hadoop using Spark context, Spark-SQL, data frame pair RDD's, Spark YARN.
- Developed Spark code and Spark-SQL/Streaming for faster testing and processing of data supported Map Reduce Programs those are running on the cluster.
- Used Spark-Streaming APIs to perform necessary transformations and actions on the fly for building the common learner data model which gets the data from Kafka in real time.
- Developed multiple Kafka Producers and Consumers from as per the software requirement specifications monitored workload, job performance and capacity planning using Cloudera Manager.
- Worked on Storm to handle the parallelization, partitioning, and retrying on failures and developed a data pipeline using Kafka and Strom to store data into HDFS.
- Worked in Agile Methodology and used JIRA for maintain the stories about project involved in gathering the requirements, designing, development and testing.
Environment: Hadoop, HDFS, Cloudera, Spark, YARN, Map Reduce, Hive, PL/SQL, Pig, Kafka, Sqoop, HBase, DB2, Java, Scala, Flume.
Confidential, Segundo - CA
Big Data Developer
Responsibilities:
- Involved in loading and transforming large sets of structured, semi-structured and unstructured data adept in complete Implementation lifecycle, specialized in writing custom MapReduce, Pig and Hive programs.
- Developed data pipeline using Flume, Sqoop, Pig and Java map reduce to ingest claim data and financial histories into HDFS for analysis.
- Worked on importing data from HDFS to MySQL database and vice-versa using SQOOP extensive experience in writing HDFS & amp; Pig Latin commands.
- Responsible for writing Pig Latin scripts we are writing Pig Latin scripts, we use the Python concepts transferring data from one source to other we use python scripting language.
- Develop UDF to provide custom hive and pig capabilities and apply business logic on dat data created Hive internal/external tables wif proper static and dynamic partitions.
- Implemented Springboot, micro services to process the messages into the Kafka cluster setup used Spring, Kafka API calls to process the messages smoothly on Kafka Cluster setup.
- Has knowledge on partition of Kafka messages and setting up the replication factors in Kafka Cluster using Hive analyzed unified historic data in HDFS to identify issues & behavioral patterns.
- Analyzed the data by performing Hive queries and running Pig scripts to study customer behavior performance tuning using Partitioning, bucketing of HIVE tables.
- Create and configured the AWSRDS/Redshift to use Hadoop Ecosystem on AWS infrastructure experienced in writing Spark Applications in Scala and Python (Pyspark).
- Experience in NoSQL database such as HBase created HBase tables, loading large data sets coming from Linux, NoSQL, and MySQL.
- Developed Oozie workflows for executing Sqoop and Hive actions very capable Confidential using AWS utilities such as EMR, S3 and Cloud watch to run and monitor Hadoop/Spark jobs on AWS.
- Installed Oozie workflow engine to run multiple MapReduce, Hive, Zookeeper and Pig jobs which run independently wif time and data availability.
- Experienced in Requirement gathering, create Test Plan, constructed and executed positive/negative test cases in-order to prompt and arrest all bugs wifin QA environment.
Environment: HDFS, Map Reduce, CDH5, HIVE, Pyspark PIG, HBase, Sqoop, Flume, Oozie, Zookeeper, AWS, MySQL, Java, Linux Shell Scripting, and XML.
Confidential, Austin - TX
Hadoop Developer
Responsibilities:
- Worked on analyzing, writing Hadoop MapReduce jobs using Java API, Pig, and Hive involved in loading data from the edge node to HDFS using shell scripting.
- Very good understanding of Partitions, Bucketing concepts in Hive and designed both Managed and External tables in Hive for optimized performance
- Solved performance issues in Hive and Pig scripts wif understanding of Joins, Group, and aggregation and how does it translate to MapReduce jobs
- Handled importing of data from various data sources, performed transformations using Hive, MapReduce, loaded data into HDFS and Extracted the data from Oracle database into HDFS using Sqoop
- Create, develop, modify and maintain Database objects, PL/SQL packages, functions, stored procedures, triggers, views, and materialized views to extract data from different sources
- Experience in managing and reviewing Hadoop log files created MapReduce jobs using Pig Latin and Hive Queries worked in tuning Hive and Pig scripts to improve performance configured MySQL Database to store Hive metadata.
- Good experience in writing MapReduce programs in Java on MRv2 / YARN environment in troubleshooting performance issues and tuning Hadoop cluster
- Imported and exported data into HDFS and Hive using Sqoop Administered, installed, upgraded and managing HDP2.2, Pig, Hive& Hbase
- Involved in start to end process of Hadoop jobs dat used various technologies such as Sqoop, PIG, Hive, MapReduce, Spark and Shell scripts(for scheduling of few jobs )
- Used Tableau9.0 to create reports representing analysis in graphical format knowledge in performance troubleshooting and tuning Hadoop clusters.
Confidential
Hadoop/Java Developer
Responsibilities:
- Extensively involved in Installation and configuration of Cloudera distribution, Name Node, Secondary Name Node, Job Tracker, Task Trackers and Data Nodes.
- Installed and configured Hadoop ecosystem like HBase, Flume, Pig and Sqoop involved in Hadoop cluster task like Adding and Removing Nodes wifout any TEMPeffect to running jobs and data.
- Load log data into HDFS using Flume worked extensively in creating MapReduce jobs to power data for search and aggregation.
- Worked extensively wif Sqoop for importing metadata from Oracle managed and reviewed Hadoop Log files designed a data warehouse using Hive.
- Created partitioned tables in Hive mentored analyst and test team for writing Hive Queries extensively used Pig for data cleansing.
- Experience in developing Business Applications using JBoss, Web Sphere and Tomcat perl scripting, shell scripting and PL/SQL programming to resolve business problems of various natures
- Developed Pig Latin scripts to extract the data from the web server output files to load into HDFS the Pig UDF HAVE to pre-process the data for analysis.
- Implemented SQL, PL/SQL Stored Procedures. Actively involved in code review and bug fixing for improving the performance screens using JSP, DHTML, CSS, AJAX, JavaScript, Struts, spring, Java and XML
- Developed workflow in Oozie to automate the tasks of loading the data into HDFS and pre-processing wif Pig exposure and knowledge on co-ordination services through Zookeeper.
- Expertise in using wif Spring, JSF, EJB, Hibernate and Struts frameworks in using Development Tools like Eclipse, My eclipse and Net beans.
- Excellent back-end SQL programming skills using MSSQL, Oracle and SQL Server wif PL/SQL.
Environment: Hadoop, MapReduce, HDFS, Pig, Hive, HBase, Zookeeper, Oozie, Core Java, Spring MVC, Hibernate UNIX Shell Scripting.
Confidential
Software Developer
Responsibilities:
- Involved in all the phases of the life cycle of the project from requirements gathering to quality assurance testing developed Class diagrams, Sequence diagrams using Rational Rose.
- Responsible in developing Rich Web Interface modules wif Struts tags,JSP, JSTL, CSS, JavaScript, Ajax, GWT presentation layer using Struts framework, and performed validations using Struts Validator plugin.
- Implemented J2EE design patterns like Singleton Pattern wif Factory Pattern created SQL script for the Oracle database the Business logic using Java Spring Transaction Spring AOP.
- Implemented persistence layer using Spring JDBC to store and update data in database produced web service using WSDL/SOAP standard.
- Developed the application using Spring MVC Framework. Performed Client side validations using Angular JavaScript & Node JavaScript.
- Extensively involved in the creation of the Session Beans and MDB, using EJB3.0 used Hibernate framework for Persistence layer.
- Developed user interface using JSP, HTML, CSS and Java Script to simplify the complexities of the application
- Extensively involved in writing Stored Procedures for data retrieval and data storage and updates in Oracle database using Hibernate.
- Deployed and built the application using Maven performed typo Unit and used JIRA to track bugs SVN for source code versioning and code repository.
- Extensively used Log4j for logging throughout the application produced a Web service using REST wif Jersey implementation for providing customer information.
Environment: Java, J2EE, Eclipse, JSP, JavaScript, JSTL, Ajax, GWT, Log4j, CSS, XML, Spring, EJB, MDB, Hibernate, Web Logic, REST, Rational Rose, JUnit, Maven, JIRA, SVN.