Hadoop/spark Developer Resume
ClevelanD
SUMMARY
- Around 8+ years of total IT experience which includes Java Development, Web application Development, Database Management and Big Data ecosystem technologies.
- Around 4 years of Leveraged Strong Skills in developing applications involving Big Data technologies like HADOOP, SPARK, MAP REDUCE, YARN, FLUME, HIVE, PIG, KAFKA, STORM, SQOOP, HBASE, CASSANDRA, HORTONWORKS, CLOUDERA, APACHE NIFI, MAHOUT, AVRO and SCALA.
- Hands - on experience with Hadoop applications such as Administration, management, monitoring, debugging, and performance tuning.
- Skilled programming in Map-Reduce framework and Hadoop ecosystems.
- Very good experience in designing and implementing MapReduce jobs to support distributed data processing and process large data sets utilizing the Hadoop cluster.
- Extracted and updated the data into MongoDB using Mongo import and export command line utility interface.
- Worked with assorted flavors of Hadoop distributions which includes Cloudera and Hortonworks.
- Pleasant Experience in working with cloud environment like Amazon Web Services (AWS) EMR, EC2, andS3.
- Uploaded and processed terabytes of data from various structured and unstructured sources into HDFS (AWS cloud) using Sqoop.
- Experience with Developing and Maintaining Applications written for Amazon Simple Storage, AWS Elastic Map Reduce, and AWS Cloud Formation.
- Exploring with the Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD's, Spark YARN.
- Developed spark application for filtering JSON source data in AWS S3 location and store it into HDFS with partitions and used spark to extract schema of Jsfudeon files.
- Configured Zookeeper to coordinate the servers in clusters to maintain the data consistency.
- Implemented POC's to migrate map reduce programs into Spark transformations using spark and Scala.
- Good knowledge on various scripting languages like Linux/Unix shell scripting and Python, continuous integration and automated deployment and management using Jenkins and udeploy.
- Hands on experience on developing UDF, DATA Frames and SQL queries in Spark SQL.
- Proficient in Data Warehousing, Data Mining concepts and ETL transformations from source to target systems.
- Experience with developing and maintaining applications written for Amazon Simple Storage, AWS Elastic Map Reduce, and AWS Cloud Formation.
- Implementing database driven applications in Java using JDBC, JSON, XML API and using hibernate framework.
- Deployment Distributed and Implementation of Enterprise applications in J2EE environment.
- Comprehensive knowledge of Software Development Life Cycle (SDLC), having thorough understanding of various phases like Requirements Analysis, Design, Development and Testing.
- Proficient in developing web page quickly and effectively using, HTML 5, CSS3, JavaScript and jQuery and experience in making web page cross browser compatible.
- Proficiency in working with PL/SQL implementation on Data warehousing concepts and strong experience in implementing data warehousing methodologies.
- Involved in the Software Life Cycle phases like AGILE and Waterfall estimating the timelines for projects.
TECHNICAL SKILLS
Big Data Skillset - Frameworks & Environments: Cloudera CDHs, Hortonworks HDPs, Hadoop1.0, Hadoop2.0, HDFS, MapReduce, Pig, Hive, Impala, HBase, Cassandra, MongoDB, Mahout, Sqoop, Oozie, Zookeeper, Flume, Splunk, Spark, Storm, Kafka, YARN, Falcon, Avro.
Amazon Web Services(AWS): Amazon Web Services, Elastic Map Reduce (EMR 4.1.0) cluster, EC2 Instances, Airflow, Amazon S3, Amazon Redshift, EMRFS, Ruby, Kinesis(streams), AWS Code Commit, AWS Code Deploy, AWS Code Pipeline, Amazon Cloud Front, AWS Import/Export.
JAVA & J2EE Technologies: Core Java (Java8 & Java FX versions), Hibernate framework, Spring framework, JSP, Servlets, Java Beans, JDBC. JavaScript, jQuery, JSF, Prime Faces, XML, Servlets, EJB, JDBC, HTML, XHTML, CSS, SOAP, XSLT and DHTML.Messaging Services JMS, MQ Series, MDB, J2EE MVC Frameworks Struts Struts 2.1, Spring 3.2, MVC, Spring Web Flow, AJAX.
IDE Tools: Eclipse, Net Beans, Spring Tool Suite, Hue (Cloudera specific).
Web services & Technologies: XML, HTML, XHTML, JNDI, HTML5, AJAX, jQuery, CSS, JavaScript, AngularJS, VB Script, WSDL, SOAP, JDBC, ODBC Architectures REST, MVC architecture.
Databases & Application Servers: Oracle, MySQL, DB2, Cassandra, MongoDB, HBase, MySQL, Teradata, Microsoft SQL-Server 2000 and DB2 8.x/9.x, PostgreSQL.
Other Tools: Putty, WinSCP, Data Lake, Talend, Tableau, GitHub, SVN, CVS.
PROFESSIONAL EXPERIENCE
Hadoop/Spark Developer
Confidential, Cleveland
Responsibilities:
- Worked on a live 24 node Hadoop cluster running on HDP 2.2.
- Importing and exporting data job's, to perform operations like copying data from HDFS and to HDFS using Sqoop.
- Created external and internal tables using HAWQ.
- Worked with Spark core, Spark Streaming and spark SQL modules of Spark.
- Hands on experience in various Bigdata application phases like data ingestion, data analytics and data visualization.
- Migrating the coding from Hive to Apache Spark and Scala using Spark SQL, RDD.
- Very well versed in workflow scheduling and monitoring tools such as Oozie, Hue and Zookeeper.
- Coded the real time with Spark Streaming Apache NIFI to store the data in Hive and HBase.
- Troubleshooting the prod issues immediately when tickets raised.
- Implemented NIFI flow topologies to perform cleansing operations before moving data into HDFS.
- Experience in working with flume to load the log data from multiple sources directly into HDFS.
- Installed and configured MapReduce, HIVE and the HDFS, implemented CDH5 and HDP clusters on CentOS.
- Experience data processing like collecting, aggregating, moving from various sources using Apache Flume and Kafka.
- Tested Apache TEZ, an extensible framework for building high performance batch and interactive data processing applications, on Pig and Hive jobs.
- Worked and learned a great deal from AWS Cloud services like EC2, S3, EBS, RDS and VPC.
- Experience in reviewing Hadoop log files to detect failures.
- Performed benchmarking of the No-SQL databases, Cassandra and HBase streams.
- Worked with Pig, HBase, NoSQL database HBASE and Sqoop, for analyzing the Hadoop cluster as well as big data.
- Knowledge of workflow/schedulers like Oozie/crontab/Autosys.
- Very good understanding of Partitions, bucketing concepts in Hive and designed both Managed and External tables in Hive to optimize performance.
- Debugging production issues tracked by JIRA.
- Creating Hive tables and working on them for data analysis to meet the business requirements.
- Developed a data pipeline using Spark and Hive to ingest, transform and analyzing data.
- Experience in using Sequence files, RCFile, AVRO and HAR file formats.
- Used FLUME to dump the application server logs into HDFS.
- Automating backups by shell for Linux to transfer data in S3 bucket.
- Providing L1/L2/L3 support for the production issues.
- Hands on experience using HP ALM. Created test cases and uploaded into HP ALM.
- Experience in using Sequence files, RCFile, AVRO and HAR file formats.
Environment: Hadoop, MapReduce, AWS, HDFS, Hive, HBase, Sqoop, Pig, Flume, Oracle 11/10g, Apache NIFI, DB2, Teradata, MySQL, HAWQ, PL/SQL, Java, Linux, Shell Scripting, SQL Developer, HP ALM.
Hadoop Developer
Confidential - Plano, TX
Responsibilities:
- Worked on analyzing Hadoop stack and different big data analytic tools including Pig, Hive, HBase database and Sqoop.
- Developed Map Reduce programs for some refined queries on big data.
- Experienced in working with Elastic MapReduce (EMR).
- Creating Hive tables and working on them for data analysis to cope up with the requirements.
- Worked with business team in creating Hive queries for ad hoc access.
- In depth understanding of Classic MapReduce and YARN architectures.
- Implemented Hive Generic UDF's to implement business logic.
- Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting.
- Installed and configured Pig for ETL jobs.
- Developed Pig UDF's to pre-process the data for analysis.
- Deployed Cloudera Hadoop Cluster on AWS for Big Data Analytics.
- Analyzed the data by performing Hive queries, ran Pig scripts, Spark SQL and Spark Streaming.
- Developed Spark code using Scala and Spark-SQL/Streaming for faster testing and processing of data.
- Used Apache NIFI to copy the data from local file system to HDFS.
- Developed Spark Streaming script which consumes topics from distributed messaging source Kafka and periodically pushes batch of data to Spark for real time processing.
- Extracted files from Cassandra through Sqoop and placed in HDFS for further processing.
- Involved in creating generic Sqoop import script for loading data into Hive tables from RDBMS.
- Involved in continuous monitoring of operations using Storm.
- Developed workflow in Oozie to automate the tasks of loading the data into HDFS and pre-processing with Pig.
- Implemented indexing for logs from Oozie to Elastic Search.
Environment: Hortonworks, Hadoop, Map Reduce, HDFS, Hive, Pig, Sqoop, Apache Kafka, Apache Storm, Oozie, SQL, Flume, Spark, HBase, Cassandra, Informatica, Java, GitHub.
Hadoop Developer
Confidential - Bloomington, IL
Responsibilities:
- Installed and configured Hadoop MapReduce, HDFS, developed multiple MapReduce jobs in Java for data cleaning and preprocessing.
- Importing and exporting data into HDFS from Oracle database and vice versa using Sqoop.
- Experience in installing, configuring Hadoop cluster for major Hadoop distributions.
- Created partitions, bucketing across state in Hive to handle structured data.
- Implemented Dash boards that handle HiveQL queries internally like Aggregation functions, basic hive operations, and different kind of join operations.
- Implemented business logic based on state in Hive using Generic UDF's.
- Developed workflow in Oozie to orchestrate a series of Pig scripts to cleanse data.
- Used Pig in three distinct workloads like pipelines, iterative processing and research.
- Extensively used PIG to communicate with Hive using HCatalog and HBASE using Handlers.
- Implemented MapReduce jobs to write data into Avro format.
- Implemented various MapReduce Jobs in custom environments and updating them to HBase tables by generating hive queries.
- Performed Sqooping for various file transfers through the HBase tables for processing of data to several NoSQL DBs- Cassandra, MongoDB.
- Involved in developing Hive UDFs and reused in some other requirements.
- Worked on performing Join operations.
- Worked on data analytics using Pig and Hive on Hadoop.
- Evaluated Oozie for workflow orchestration in the automation of MapReduce jobs, Pig and Hive jobs.
- Extracted files from MongoDB through Sqoop and placed in HDFS and processed.
- Captured the data logs from web server into HDFS using Flume & Splunk for analysis.
- Experienced in writing Pig scripts and Pig UDFs to pre-process the data for analysis.
- Experienced in managing and reviewing Hadoop log files.
- Built front end using JSP, JSON, Servlets, HTML and JavaScript to create user friendly and appealing interface.
- Reporting Expertise through Talend.
- Gained very good business knowledge on health insurance, claim processing, fraud suspect identification, appeals process etc.
Environment: Hive, Pig, MapReduce, AVRO, Sqoop, Oozie, Flume, Kafka, Storm, HBase, Unix, Python, SQL, HDFS, Talend, Pig, Hive, HBase, GitHub, MapReduce, Java, Sqoop, Flume, Splunk, Oozie, Linux, UNIX Shell & Python Scripting.
Hadoop Developer
Confidential - Brentwood, TN
Responsibilities:
- Worked on analyzing writing Hadoop MapReduce jobs using JavaAPI, Pig and Hive.
- Involved in loading data from edge node to HDFS using shell scripting.
- Created HBase tables to store variable data formats of PII data coming from different portfolios.
- Analyze large and critical datasets Cloudera, HDFS, HBase, MapReduce, Hive, Hive UDF, Pig, Sqoop, Zookeeper, &Spark.
- Developed custom aggregate functions using spark SQL and performed interactive querying. Used pig to store the data into HBase.
- Used pig to parse the data and store in Avro format.
- Stored the data in tabular formats using Hive tables and Hive SerDes.
- Implemented a script to transmit information from Oracle to HBase using Sqoop.
- Worked on tuning the performance Pig queries.
- Exported the analyzed data to the relational databases using Sqoop visualization and to generate reports for the BI team.
- Implemented MapReduce programs to handle semi/unstructured data like XML, JSON, and sequence files for the log files.
Environment: Hadoop, HDFS, Pig, Sqoop, Spark, MapReduce, Cloudera, Snappy, Zookeeper, NoSQL, HBase, Ubuntu, Linux Red Hat.
Java Developer
Confidential
Responsibilities:
- Designed and developed the server-side layer using XML, JSP, JDBC, JNDI, EJB and DAO patterns using eclipse IDE.
- Developed java beans and JSP's by using spring and JSTL tag libs for supplements.
- Development of EJB's, Servlets and JSP files for implementing Business rules and Security options using IBM Web Sphere.
- Designed and implemented the architecture for the project using OOAD, UML design patterns.
- Extensively involved working in the usage of HTML, CSS, JavaScript and Ajax for client-side development and validations.
- Involved in creating tables, stored procedures in SQL for data manipulation and retrieval using SQL Server, Oracle and DB2.
- Participated in requirement gathering and converting the requirements into technical specifications.
Environment: Java, Eclipse IDE, Ajax, Apache Axis, OOAD, Java script, HTML, XML, CSS, SQL Server, Oracle, Web services, Ajax, Spring, OOAD and UML
Java Software
Confidential
Responsibilities:
- Developed the user interface screens using swing for accepting various system inputs such as contractual terms, monthly data pertaining to production, inventory and transportation.
- Involved in design and development of UI using HTML, JavaScript and CSS.
- Used dispatch action to group related actions into a single class.
- Applied J2EE design patterns like business delegate, DAO and singleton.
- Actively involved in testing, debugging and deployment of the application on WebLogic application server.
- Developed test cases and performed unit testing using JUnit.
- Involved in fixing bugs and minor enhancements for the front-end modules.
Environment: Java, HTML, Java script, CSS, Oracle, J2EE, DAO, ANT tool, SQL, Swing and Eclipse.
