We provide IT Staff Augmentation Services!

Hadoop/big Data Developer Resume

3.00/5 (Submit Your Rating)

North Haven, ConnecticuT

SUMMARY

  • Over 6+ years of industrial experience in IT technology like Big data with hands - on expertise in development on Hadoop ecosystem and Java
  • Experience in working with different cloud infrastructures like Amazon Web Services (AWS), Azure and Google Cloud Platform (GCP)
  • Worked with Apache Hadoop along enterprise version of Cloudera and Hortonworks. Good Knowledge on MAPR distribution
  • Used Pyspark for structured and semi-structured data. And worked on NoSQL Db’s using Mongo DB, HBase, Cassandra
  • Ingested data into Hadoop from various data sources like Oracle, MySQL using Sqoop tool. Created Sqoop job with incremental load to populate Hive External tables.
  • Experience in Flume for faster consumption content and used multiple flume agents to collect data.
  • Experienced in using distributed computing architectures such as AWS products (e.g. EC2, Redshift, and EMR, Elastic search), Hadoop, Python, Spark and effective use of MapReduce, SQL and Cassandra to solve bigdata type problems
  • Proven skills in writing MAP R Programs for analyzing structured and unstructured data, using Apache Hadoop Java API
  • Experience in front end technologies like HTML, CSS and JavaScript
  • Used tools like Tableau for analytics on data in cloud
  • Experience in administrative tasks such as installing Hadoop and its ecosystem components such as Hive and Pig in Distributed Mode.
  • Experienced in setting up Elastic Cloud Compute (EC2) infrastructure for computational tasks and Simple Storage Service(S3) as Storage mechanism.
  • Knowledge in job workflow scheduling and monitoring tools like Oozie (hive, Pig) and Zookeeper (Hbase)
  • Developed Spark code using Scala and Spark-SQL/Streaming for faster processing of data.
  • Experience in handling Hive tables using Spark SQL
  • Knowledge on YARN configuration worked on Spark using Scala on cluster for analytics, installed it on top of Hadoop, which performed advanced analytical application by making use of Spark with Hive and SQL/Oracle.
  • Done Clustering, regression and classification using machine learning libraries like SKlearn, MLlib (Spark)
  • Integrated Apache Storm with Kafka to perform web analytics.
  • Experience in sharding the large datasets using hierarchical keys and column-oriented Databases using DynamoDB
  • Knowledge for deep learning concepts like CNN, RNN, LSTM’s and used Keras, TensorFlow for building deep learning models
  • Experience in using GitHub for various code reviews and worked on various version control tools like CVS, GIT, SVN.
  • Implemented web-services for network related applications in java.
  • Handful experience in working with different software methodologies like Water fall and agile methodologies and knowledge for DevOps

TECHNICAL SKILLS

Languages: Python, Java, Scala, SQL

Operating Systems: Windows, Linux, Mac OS, UNIX

Big Data Technologies: HDFS, MapReduce, YARN, Pig, Hive, Sqoop, Hbase, Cassandra, MongoDB, Spark, Zookeeper, Oozie, Flume.

Cloud Platforms: AWS (Amazon Web Services), GCP (Google Cloud Platform)

AWS Hadoop Services: EMR, S3, EC2, DataPipeline, Redshift Database

Web Technologies: Java Script, CSS, HTML

Version control: GIT, SVM, CVS, Bitbucket

Reporting: Tableaue

Development Methodologies: Waterfall, Agile, DevOps (Knowledge)

PROFESSIONAL EXPERIENCE

Confidential, North Haven, Connecticut

Hadoop/Big Data Developer

Responsibilities:

  • Developed Spark Applications by using Scala, Java and implemented Apache Spark data Processing Project to handle data from various RDBMS and streaming sources
  • Used Kafka consumer’s API in Scala for consuming data from Kafka topics
  • Created Sqoop job to bring the data from Oracle to HDFS and created external hive tables in hive.
  • Knowledge on Pyspark and used Hive to analyze sensor data and cluster users based on their behavior in the events.
  • Created External Tables in Hive and saved in ORC file format.
  • Ingested data from RDBMS and performed data transformations, and then export the transformed data to Cassandra as per the business requirement.
  • Built data pipeline using Pig to store onto HDFS.
  • Worked on HiveQL for data analysis for importing the structured data to specific tables for reporting.
  • Wrote Python scripts to parse XML documents and load the data in database.
  • Experience in working with Hive to create Value Added Procedures. Also wrote Hive UDF to make the function reusable for different models.
  • Loaded the dataset into Hive for ETL (Extract, Transfer and Load) operation.
  • Implemented Kafka model which pulls the latest records into hive external tables.
  • Involved in developing code to write canonical model JSON records from numerous input sources to Kafka Queues.
  • Developed a Spark Script Apache Nifi, to do the source to target mapping according to the design document developed by designers
  • Worked extensively on AWS components like Elastic Map Reduce (EMR), Elastic Compute Cloud (EC2), Simple Storage Service (S3)
  • Used Amazon Cloud Watch to monitor and track resources on AWS
  • Developed Data frames for data transformation rules.
  • Developed spark SQL queries to join source tables with multiple driving tables and created a targeted table in hive.
  • Optimized the code using Pyspark for better performance.
  • Developed a spark application to do the source to target mapping.
  • Involved in running Hadoop streaming jobs to process terabytes of text data. Worked with different file formats such as Text, Sequence files, Avro, ORC and Parquette.
  • Collected the data using Spark streaming and dump into Hbase
  • Experience in jupyter notebook for spark SQL and scheduling the cronjobs using spark submit
  • Developed python script for start a job and end a job smoothly for a UC4 workflow
  • Worked on NiFi data Pipeline to process large set of data and configured Lookup’s for Data Validation and Integrity.
  • Developed Scala scripts, UDF's using both Data frames/SQL and RDD/MapReduce in Spark 2.0.0 for Data Aggregation, queries and writing data back into RDBMS through Sqoop.
  • Exposure on usage of Apache Kafka develop data pipeline of logs as a stream of messages using producers and consumers.
  • Developed Python scripts to clean the raw data.
  • Experienced in writing Spark Applications in Scala and Python.
  • Developed and analyzed the SQL scripts and designed the solution to implement using Pyspark
  • Worked as production support to monitor and debug the issues that causing the problems while jobs are running which were scheduled.
  • Used Bitbucket as version control, JIRA for bug tracking and Control-M for scheduling the jobs
  • Worked in agile Methodology
  • Fetch and generate monthly reports. Visualization of those reports using Tableau
  • Experienced in Cauterize NiFi Pipeline on EC2 nodes integrated with Spark, Kafka, Postgres

Environment: Hadoop, Hive, Linux, Sqoop, Oracle, Spark, Pyspark, shell Scripting, agile methodology, UC4, Kafka, Hbase, JIRA, Nifi, Tableau, Jupyter Notebook, Aws Tools (S3, EMR, EC2, Cloud Watch)

Confidential, Washington DC

Hadoop/Spark Developer

Responsibilities:

  • Experienced in Hive scripts to create the tables in hive
  • Implemented Kafka model which pulls the latest records into hive external tables.
  • Used MongoDB for CRUD operations like insert, Update and Delete data
  • Using MongoDB, I worked on concepts such as locking, transactions, indexes, shading, replication, Schema design
  • Implemented Daily Cron jobs that automate parallel tasks of loading the data into HDFS and pre-processing with Pig using Oozie co-coordinator jobs. configured Hadoop MapReduce, HDFS , developed multiple MapReduce jobs in Java and Nifi for data cleaning and preprocessing .
  • Developed Spark code using Scala and Spark-SQL/Streaming for faster processing of data.
  • Worked on Exploring with the Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Apache Kafka, Data Frame, Pair RDD's, Spark YARN.
  • Created a spark SQL for joining three hive tables and write them to a hive table and stored them on to S3.
  • Setting up Sqoop for batch processing, having various data sources, data targets and data formats
  • Started using apache NiFi to copy the data from local file system to HDFS .
  • Worked on Spark SQL for joining multi hive tables and write them to a final hive table and stored them on S3.
  • Used Python to extract weekly information from XML files.
  • Effectively migrated data from different source systems to build a secure data warehouse
  • Did automation of the ETL processes using UNIX shell scripting
  • Performed data cleaning and pre- processing using multiple Map Reduce jobs in PIG and Hive
  • Expertise in importing and exporting streaming data into HDFS using stream processing platforms like flume and Kafka
  • Understanding in Zookeeper configuration as to provide cluster coordination services
  • Experience in Launching EC2 instances in Amazon EMR using Console
  • Created data-models for Client’s transactional logs, analyzed the data from Casandra tables from quick searching, sorting and grouping using the Cassandra Query Language (CQL)
  • Created UDFs to calculate the pending payment for the give customer data based on last day of every month and used in Hive Script
  • Involved in working with integrate tools like Apache Kafka,Elastic search with existing source systems
  • Involved in loading data from Linux, Apache Nifi file systems, servers, java web services using Kafka producers and consumers. created new variables in Splunk and assigned new regular expressions to those created variables to differentiate with existing.
  • Used Kerberos and integrated it to Hadoop cluster to make it more strong and secure from unauthorized access
  • Experience in writing Shell Scripts to run the jobs in parallel and increase the performance
  • Experience in Query data using Spark SQL on the top of Spark Engine implementing Spark RDD’s in Python
  • Used different file formats like Text files, Sequence Files, Avro, Optimized Row Columnar (ORC)

Environment: Hadoop, Hive, Linux, Spark SQL, Mongo DB, Spark, Scala,Hibernate, agile methodology, MAPR, Cassandra Query Language, Oozie, Kafka, Flume, Nifi, Pig, Kerberos, Zookeeper, Python, HDFS.

Confidential, Carrolton, TX

Hadoop/Spark Developer

Responsibilities:

  • Involved in writing Unix/Linux Shell Scripting for scheduling jobs and for writing PIG scripts and Hive QL.
  • Involved in Database design and developing SQL Queries, stored procedures on MySQL
  • Setting up Hadoop MapReduce, big data,HDFS and developed multiple MapReduce jobs in Java for data cleansing and preprocessing.
  • Performed optimization tasks like using distributed cache for small datasets, partition and bucketing in hive, doing map side joins etc.
  • Used Hive and Impala to query the data in HBase
  • Created Hive tables and involved in data loading and writing Hive UDFs. Developed Hive UDFs for rating aggregation
  • Developed numerous MapReduce jobs in Scala for Data Cleansing and Analyzing Data in Impala.
  • Experience in Creating data pipeline for different events of ingestion, aggregation and load consumer response data in AWS S3 bucket into Hive external tables in HDFS location to serve as feed for tableau dashboards.
  • Used EMR (Elastic Map Reducing) to perform bigdata operations in AWS
  • Developed UI application using AngularJS, integrated with Elastic Search to consume REST.
  • Cluster coordination services through Zookeeper
  • Used Sqoop to extract data from Oracle SQL server and MySQL databases to HDFS
  • Data access framework by Spring is used for automatically acquiring and releasing database resources and exception handling by spring data access hierarchy for better handling of database connections with JDBC.
  • Worked with team of Developers and Testers to resolve the issues with the server timeouts and database connection pooling issues.
  • Implemented several JUnit test case
  • Did web logging application for better trace the data flow on application server using Log4J
  • Worked with team of Developers and Testers to resolve the issues with the server timeouts and database connection pooling issues.
  • Responded to requests from Technical Team members to prepare a TAR and configured files for Production migration.

Environment: Linux, Pig, Hive QL, MySQL, Map R, Scala, HDFS, Impala, AWS Tools (EMR, EC2, S3), Zookeeper, Sqoop, Junit, Log4j, EMR.

Confidential

JAVA/Hadoop/Spark Developer

Responsibilities:

  • Involved in Requirements Analysis and design an Object-oriented domain model
  • Implemented test scripts to support test driven development and continuous integration
  • Experience in Importing and exporting data into big data,HDFS and Hive using Sqoop
  • Developed Map-Reduce programs to clean and aggregate the data
  • Worked in complete SDLC phase like Requirements, Specification, Design, Implementation and Testing
  • Developed Spring and Hibernate data layer components for application
  • Developed profile view web pages add, edit using HTML, CSS, JQuery, Java Script
  • Developed the application by using MAVEN script
  • Developed the mechanism for logging and debugging with Log4j
  • Involved in developing database tractions through JDBC
  • Used GIT for version control
  • Responsible for troubleshooting issues in the execution of MapReduce jobs by inspecting and reviewing log files.
  • Used oracle as Database and used load for queries execution and involved in writing SQL scripts, PL/SQL code for procedures and functions
  • Developed Front-end applications which will interact the mainframe applications using J2C connectors
  • Hands on experience in exporting the results into relational databases using Sqoop for visualization and to generate reports for BI team
  • Designing, Development and implementation of JSPs in presentation layer for submission, Application, reference implementation
  • Deployed Web, presentation and business components on Apache Tomcat Application Server.
  • Involvement in post-production support, Testing and used JUNIT for unit testing of the module
  • Worked in Agile methodology

Environment: HDFS, Hive, Sqoop, Java, Core Java, Maven, HTML, CSS, Java Script, GIT, Map R, JUNIT, Agile, Log4j, SQL, Agile.

We'd love your feedback!