Big Data Developer Resume
Livonia, MI
PROFESSIONAL SUMMARY:
- Certified Hadoop and Spark Developer from Cloudera and Horton works with 8+ years of experience in developing applications using SQL, Java, Spark, AWS, Big Data
- 4 years of experience in Big Data tools.
- In depth Knowledge and experience in using Hadoop ecosystem tools likeHDFS, MapReduce, YARN, Pig, Hive, Sqoop, Kafka, Flume, Oozie and Zookeeper.
- Excellent understanding and extensive knowledge of Hadoop architecture and various ecosystem components such as HDFS, JobTracker, TaskTracker, NameNode, DataNode, MapReduce programming paradigm and strong knowledge of Rack awareness topology.
- Good usage of Apache Hadoop along enterprise version of Cloudera, Hortonworks and MapR distribution.
- Expert in importing and exporting data from different Relational Database Systems like MySQL and Oracle into HDFS and vice - versa.
- Hands on experience with data ingestion tools Kafka, Flume and workflow management tools Oozie
- Experience on analyzing data in NoSQL databases like Hbase, Cassandra, DynamoDB.
- Capable of processing large sets of structured, semi- structed and unstructured data and supporting systems application architecture.
- Experience in using data visualization libraries such as Tableau.
- Expertise in Extending Hive and Pig core functionality by writing custom UDF’s Spark Core, SQL and Streaming:
- Developed Spark applications for data transformations and loading into HDFS using RDD, DataFrames.
- Extensive Knowledge on performance tuning of Spark applications and converting Hive/SQL queries into Spark transformations.
- Hands on experience handling different file formats like JSON, AVRO, ORC, Parquet and Compression Snappy, zlib, ls4, etc.
- Experience in execution of Batch jobs through the data streams to SPARK Streaming.
- Hands-on experience with AWS (Amazon Web Services), using Elastic MapReduce (EMR), creating and storing data in S3buckets and creating Elastic Load Balancers(ELB)
- Extensive knowledge on creating Hadoop cluster on multiple EC2 instances in AWS and configuring them and using IAM (Identity and Access Management) for creating groups, users and assigning permissions.
- Hands-on experience with Data-pipeline, moving data between S3 and DynamoDB.
- Extensive programming experience in Java Core concepst like OOPS, Multithreading, Collections and IO.
- Extensive experience with UNIX commands, shell scripting and setting up CRON jobs.
- Good experience in using Relational databases like Oracle, MySQL etc.
- Experience in working with build tools like Maven, SBT, Gradle to build and deploy applications into server.
- Expertise in Object Oriented Analysis and Design(OOAD) and knowledge in Unified Modeling Language(UML)
- Expertise in complete Software Development Life Cycle (SDLC) in Waterfall and Agile.
- Experience in software configuration Management using Git.
- Experience in using IDEs like Eclipse, NetBeans and Intellij.
- Experience in developing web pages interfaces using JSP, Java Swings and HTML.
- Comprehensive Knowledge on software development using Shell scripting, Core Java and Web technologies.
- Experience in working with JAVA,JDBC,ODBC,JSP, Servlets, Java Beans,
- Developed stored procedures and queries using PL/SQL.
- Successfully working in face-paced environment, both independently and in collaborative team environments.
- Strong background in mathematics and have very good analytical and problem-solving skills.
TECHNICAL SKILLS:
Bigdata Technologies: HDFS, Map Reduce, Pig, Hive, Sqoop, Oozie, Scala, Spark, Kafka, Flume, Ambari.
Hadoop Frameworks: Cloudera CDHs, Hortonworks HDPs, MAPR.
Database: Oracle 10g/11g, PL/SQL, MySQL, MS SQL Server 2012,DB2
Language: C, C++, Java, Scala, Python
AWS Components: IAH, S3, EMR, EC2,Lambda, Route 53, Cloud Watch, SNS
Development Methodologies: Agile, Waterfall
Build Tools: Maven, Gradle, Jenkins.
NOSQL Databases: HBase, Cassandra, MongoDB, DynamoDB
IDE Tools: Eclipse, NetBeans, Intellij
Modelling Tools: Rational Rose, StarUML, Visual paradigm for UML
Relational DBMS, Client: Server Architecture
Cloud Platforms: AWS Cloud
BI Tools: Tableau
Operating System: Windows 7/8/10, Vista, UNIX, Linux, Ubuntu, Mac OS X
PROFESSIONAL EXPERIENCE:
Confidential, Livonia, MI
Big Data Developer
Responsibilities:
- Worked on Horton works-HDP 2.5 distribution.
- Involved in review of functional and non-functional requirements.
- Involved in importing data from IBM DB2into HDFS using Sqoop and created Hive external tables.
- Developed Spark scripts by using Scala as per the requirement.
- Load the data into SparkRDD and performed in-memory data computation to generate the output response.
- Performed different types of transformations and actions on the RDD to meet the businessrequirements.
- Involved in loading data from UNIX file system to HDFS.
- Developed multiple Map Reduce jobs inScala for data cleaning and pre-processing.
- Involved in managing and reviewing Hadoop log files.
- Exported the analyzed data to the relational databases using Sqoop to generatereports for the BI team.
- Involved in writing shell scripts for execution of HiveQL.
- Analyzed NDW data-based Business logic and visualized data in Tableau.
- Deployed Tableau Dashboard in Tableau Server and Scheduled data loading.
- Migrated Existing SQL queries into SparkSQL and executed Spark job in cluster mode.
- Implemented best offer logic in Sparkby writingSpark UDFsin Scala.
- Created Calculated fields in Tableau for aggregation, transformation of data in Tableau.
- Involved in scheduling Oozie workflow engine to run Hive queries.
Environment: Horton works, Hadoop, HDFS, Sqoop, Hive, Oozie, Zookeper, NoSQL, Shell Scripting, Scala, Spark, SparkSQL, Tableau, Tableau Server.
Confidential, Middletown, NJ
Hadoop Developer
Responsibilities:
- Worked on Hortonworks-HDP 2.5distribution
- Responsible for building-scalable distribution data solution using Hadoop
- Involved in importing data from MicrosoftSQLServer, MySQL, Teradata. into HDFS using Sqoop.
- Played a key role in dynamic partitioning and Bucketing of the data stored in Hive Metadata.
- Writing HiveQL queries for integrating different tables for create views to produce result set.
- Collected the log data from Web Servers and integrated into HDFS using Flume.
- Experienced on loading and transforming of large sets of structed and unstructured data.
- Developed MapReduce programs for data cleaning and transformations and load the output into the Hive tables in different file formats.
- Written Several MapReduce Jobs in Spark using Scala programming language.
- Written MapReduce programs to handle semi structed and un structed data like JSON, Avro data files and sequence files for log files.
- Involved in loading data into HBaseNoSQL database.
- Building, Managing and scheduling Oozie workflows for end to end job processing
- Experienced in extending Hive and Pig core functionality by writing custom UDFs using Java.
- Analyzing of Large volumes of structured data using SparkSQL.
- Written shell script to execute HiveQL.
- Written Automated shell scripts in Linux/Unix environment using bash.
- Analyze Hive log files and fix the issues making sure all the jobs run fine.
- Migrated HiveQL queries into SparkSQLto improve performance.
- Extracted Real time feed using Spark streaming and convert to RDD and process data into Data Frame and load the data into HBase.
- Experienced in using Data Stax Spark connector which is used to store the data into Cassandra databaseor get the data from Cassandra database.
- Extracted Real time feed using Spark streaming and convert it to RDD and process data into Data Frame and load the data into Cassandra.
- Imported real time weblogs using Fulme and ingested the data to Spark Streaming.
- Used Flume to collect, aggregate and push log data from different log servers.
- Extensive experience tuning Hive queries using memory joins for faster execution and appropriating resources.
- Implemented Flume, Spark, and Spark Streaming framework for real time data processing.
- Developed Spark code using Scala and Spark-SQL/Streaming for faster processing of data.
- Generate the reports using Tableau.
- Migrated Map Reduce jobs to Spark jobs to achieve better performance.
- Used GitHub as repository for committing code and retrieving it and Jenkins for continuous integration.
Environment: Hortonworks, Hadoop, HDFS, Pig, Sqoop, Hive, Oozie, Zookeper, NoSQL, HBase, Shell Scripting, Scala, Spark, SparkSQL, Git, GitHub.
Confidential, Raleigh, NC
Hadoop Developer
Responsibilities:
- Worked on Cloudera CDH distribution.
- Hand on experience on cloud services like Amazon Web Services (AWS)
- Created data pipelines for different events to load the data from DynamoDB to AWS S3 bucket and then into HDFS location.
- Developed Sqoop scripts to import data from relational sources and handled incremental loading.
- Create Hive external tables for data in HDFS locations.
- Written Hive queries for data analysis to meet the business requirements.
- Used various Hive optimization techniques likes partitioning, bucketing, Mapjoins and merge small files and vectorization.
- Process the complex/nested Json and CSV data using Spark Data Frames.
- Used Spark as ETL tool
- Developed various Spark jobs with Scala as programming for Data Analysis on Different data formats.
- Automatically scale-up the EMR Instances based on the data.
- Apply Transformation rules on the top of Data Frames.
- Imported real time weblogs using Kafka as a messaging system and ingested the data to Spark Streaming.
- Used Kafka to load data into HDFS and move data into NoSQL databases.
- Deployed the project on Amazon EMR with S3 Connectivity.
- Implemented usage of Amazon EMR for processing Big Data across a Hadoop Cluster of virtual servers on Amazon Elastic Compute Cloud (EC2) and Amazon Simple Storage Service(S3).
- Loaded the data into Simple Storage Service (S3) in the AWS Cloud.
- Good Knowledge in using of Amazon Load Balancer for Auto scaling in EC2 servers.
- Executed the Spark jobs in Amazon EMR.
- Generated the reports using Tableau.
Environment: Data Pipeline, Hive, Impala, Amazon Elastic Cloud, Amazon Load Balancer, Amazon Simple Storage Service, Amazon EMR. Spark, Spark SQL, Cloudera, Intellij IDE.
Confidential, Portland, Oregon
Hadoop Developer
Responsibilities:
- Worked on Cloudera CDH distribution.
- Worked on Cluster size of 50-100 nodes.
- Loading data from different relational databases to HDFS using Sqoop.
- Implemented Daily jobs that automate parallel tasks of loading the data into HDFS using Oozie coordinator jobs.
- Involved in review of functional and non-functional requirements.
- Created External Hive tables and executed complex Hive queries on them using Hive QL.
- Solved performance issues in Hive and Pig Scripts with understanding of Joins, Group and Aggregation and perform the MapReduce jobs.
- UsedSparkfor transformations, event joins and some aggregations before storing the data into HDFS.
- Troubleshoot and resolve data quality issues and maintain elevated level of data accuracy in the data being reported.
- Analyze the large amount of data sets to determine optimal way to aggregate.
- Used Oozie to automate/schedule business workflows which invoke sqoop, MapReduce and pig jobs as per the requirements.
- Loaded data from UNIX file system to HDFS and written Hive User Defined Functions.
- Involved in processing ingested raw data using Apache Pig.
- Involved in migrating HiveQL into Impala to minimize query response time.
- Used Pig, Spark as ETL tool.
- Involved in creating UDF’s in Spark using Scala programming Language.
- Monitored continuously and managed the Hadoop cluster using cloudera manager.
- Created Hive-Hbase tables for data storage Hive for Meta-Store and Hbase for data storage in Row Key Format.
- Implemented Impala to have a better database management system for data stored in computer clusters which are running on Apache Hadoop and also provides less stress on CPU than HIVE.
- Worked on different file formats like JSON, AVRO, ORC, Parquet and Compression like Snappy, zlib, ls4 etc.
- Invoved in writing the shell scripts for exporting log files to Hadoop cluster through automated process.
- Gained Knowledge in creating Tableau dashboard for reporting analyzed data.
- Expertise with NoSQL databases like HBaseand loaded the data into HBase.
- Experienced in managing and reviewing the Hadoop log files.
- Used GitHub as repository for committing code and retrieving it and Jenkins for continuous integration.
- Involved in creating and maintaining of the technical documentation for MapReduce, Hive, Sqoop, Spark jobs along with Hadoop Clusters and also reviewing them to fix the post production issues.
Environment: HDFS, MapReduce, Sqoop, Hive, Pig, Oozie, Cloudera, MySQL, Eclipse, Spark, Git, GitHub, Jenkins.
Confidential
Java Developer
Responsibilities:
- Involved in all the phases of the life cycle of the project from requirement gathering to quality assurance testing.
- Developed UML diagrams using Rational Rose.
- Involved in developing applications using Java, JSP, Servlets, Swings.
- Developed UI using HTML, CSS, Ajax, JQuery and developed Business Logic and Interfacing Components using Business Objects, XML and JDBC.
- Created applications, connection pools, deployment of JSP & Servlets.
- Used Oracle, MySQL database for storing user information.
- Developed backed for application using PHP for web applications.
- Used the Eclipse as IDE, configured and deployed the application onto WebLogic application server using Maven build scripts to automate the build and deployment process.
Environment: Java, JSP, Swings, Oracle, HTML, CSS, PHP, Servlets, Eclipse, JSP, Servlets, MySQL.
Confidential
Java Developer
Responsibilities:
- Hands on experience in all phases of SDLC (software development life cycle) involving.
- Developed UML diagrams using Rational Rose
- Created UI for web applications using HTML, CSS.
- Created Desktop applications using J2EE, Swings.
- Developed the process using Waterfall model.
- Created SQL scripts for Oracle database.
- Executed test cases manually to verify expected results.
- Used JDBC to establish connection between the database and the application.
- Involved in designing, coding, debugging, documenting and maintaining the applications.
Environment: Rational Rose, HTML, CSS, J2EE, Swings, SQL, Oracle 9i Java, Servlets.