Hadoop/Big Data Developer Resume North Haven, Connecticut - Hire IT People

SUMMARY

Over 6+ years of industrial experience in IT technology like Big data with hands - on expertise in development on Hadoop ecosystem and Java
Experience in working with different cloud infrastructures like Amazon Web Services (AWS), Azure and Google Cloud Platform (GCP)
Worked with Apache Hadoop along enterprise version of Cloudera and Hortonworks. Good Knowledge on MAPR distribution
Used Pyspark for structured and semi-structured data. And worked on NoSQL Db’s using Mongo DB, HBase, Cassandra
Ingested data into Hadoop from various data sources like Oracle, MySQL using Sqoop tool. Created Sqoop job with incremental load to populate Hive External tables.
Experience in Flume for faster consumption content and used multiple flume agents to collect data.
Experienced in using distributed computing architectures such as AWS products (e.g. EC2, Redshift, and EMR, Elastic search), Hadoop, Python, Spark and effective use of MapReduce, SQL and Cassandra to solve bigdata type problems
Proven skills in writing MAP R Programs for analyzing structured and unstructured data, using Apache Hadoop Java API
Experience in front end technologies like HTML, CSS and JavaScript
Used tools like Tableau for analytics on data in cloud
Experience in administrative tasks such as installing Hadoop and its ecosystem components such as Hive and Pig in Distributed Mode.
Experienced in setting up Elastic Cloud Compute (EC2) infrastructure for computational tasks and Simple Storage Service(S3) as Storage mechanism.
Knowledge in job workflow scheduling and monitoring tools like Oozie (hive, Pig) and Zookeeper (Hbase)
Developed Spark code using Scala and Spark-SQL/Streaming for faster processing of data.
Experience in handling Hive tables using Spark SQL
Knowledge on YARN configuration worked on Spark using Scala on cluster for analytics, installed it on top of Hadoop, which performed advanced analytical application by making use of Spark with Hive and SQL/Oracle.
Done Clustering, regression and classification using machine learning libraries like SKlearn, MLlib (Spark)
Integrated Apache Storm with Kafka to perform web analytics.
Experience in sharding the large datasets using hierarchical keys and column-oriented Databases using DynamoDB
Knowledge for deep learning concepts like CNN, RNN, LSTM’s and used Keras, TensorFlow for building deep learning models
Experience in using GitHub for various code reviews and worked on various version control tools like CVS, GIT, SVN.
Implemented web-services for network related applications in java.
Handful experience in working with different software methodologies like Water fall and agile methodologies and knowledge for DevOps

TECHNICAL SKILLS

Languages: Python, Java, Scala, SQL

Operating Systems: Windows, Linux, Mac OS, UNIX

Big Data Technologies: HDFS, MapReduce, YARN, Pig, Hive, Sqoop, Hbase, Cassandra, MongoDB, Spark, Zookeeper, Oozie, Flume.

Cloud Platforms: AWS (Amazon Web Services), GCP (Google Cloud Platform)

AWS Hadoop Services: EMR, S3, EC2, DataPipeline, Redshift Database

Web Technologies: Java Script, CSS, HTML

Version control: GIT, SVM, CVS, Bitbucket

Reporting: Tableaue

Development Methodologies: Waterfall, Agile, DevOps (Knowledge)

PROFESSIONAL EXPERIENCE

Confidential, North Haven, Connecticut

Hadoop/Big Data Developer

Responsibilities:

Developed Spark Applications by using Scala, Java and implemented Apache Spark data Processing Project to handle data from various RDBMS and streaming sources
Used Kafka consumer’s API in Scala for consuming data from Kafka topics
Created Sqoop job to bring the data from Oracle to HDFS and created external hive tables in hive.
Knowledge on Pyspark and used Hive to analyze sensor data and cluster users based on their behavior in the events.
Created External Tables in Hive and saved in ORC file format.
Ingested data from RDBMS and performed data transformations, and then export the transformed data to Cassandra as per the business requirement.
Built data pipeline using Pig to store onto HDFS.
Worked on HiveQL for data analysis for importing the structured data to specific tables for reporting.
Wrote Python scripts to parse XML documents and load the data in database.
Experience in working with Hive to create Value Added Procedures. Also wrote Hive UDF to make the function reusable for different models.
Loaded the dataset into Hive for ETL (Extract, Transfer and Load) operation.
Implemented Kafka model which pulls the latest records into hive external tables.
Involved in developing code to write canonical model JSON records from numerous input sources to Kafka Queues.
Developed a Spark Script Apache Nifi, to do the source to target mapping according to the design document developed by designers
Worked extensively on AWS components like Elastic Map Reduce (EMR), Elastic Compute Cloud (EC2), Simple Storage Service (S3)
Used Amazon Cloud Watch to monitor and track resources on AWS
Developed Data frames for data transformation rules.
Developed spark SQL queries to join source tables with multiple driving tables and created a targeted table in hive.
Optimized the code using Pyspark for better performance.
Developed a spark application to do the source to target mapping.
Involved in running Hadoop streaming jobs to process terabytes of text data. Worked with different file formats such as Text, Sequence files, Avro, ORC and Parquette.
Collected the data using Spark streaming and dump into Hbase
Experience in jupyter notebook for spark SQL and scheduling the cronjobs using spark submit
Developed python script for start a job and end a job smoothly for a UC4 workflow
Worked on NiFi data Pipeline to process large set of data and configured Lookup’s for Data Validation and Integrity.
Developed Scala scripts, UDF's using both Data frames/SQL and RDD/MapReduce in Spark 2.0.0 for Data Aggregation, queries and writing data back into RDBMS through Sqoop.
Exposure on usage of Apache Kafka develop data pipeline of logs as a stream of messages using producers and consumers.
Developed Python scripts to clean the raw data.
Experienced in writing Spark Applications in Scala and Python.
Developed and analyzed the SQL scripts and designed the solution to implement using Pyspark
Worked as production support to monitor and debug the issues that causing the problems while jobs are running which were scheduled.
Used Bitbucket as version control, JIRA for bug tracking and Control-M for scheduling the jobs
Worked in agile Methodology
Fetch and generate monthly reports. Visualization of those reports using Tableau
Experienced in Cauterize NiFi Pipeline on EC2 nodes integrated with Spark, Kafka, Postgres

Environment: Hadoop, Hive, Linux, Sqoop, Oracle, Spark, Pyspark, shell Scripting, agile methodology, UC4, Kafka, Hbase, JIRA, Nifi, Tableau, Jupyter Notebook, Aws Tools (S3, EMR, EC2, Cloud Watch)

Confidential, Washington DC

Hadoop/Spark Developer

Responsibilities:

Experienced in Hive scripts to create the tables in hive
Implemented Kafka model which pulls the latest records into hive external tables.
Used MongoDB for CRUD operations like insert, Update and Delete data
Using MongoDB, I worked on concepts such as locking, transactions, indexes, shading, replication, Schema design
Implemented Daily Cron jobs that automate parallel tasks of loading the data into HDFS and pre-processing with Pig using Oozie co-coordinator jobs. configured Hadoop MapReduce, HDFS , developed multiple MapReduce jobs in Java and Nifi for data cleaning and preprocessing .
Developed Spark code using Scala and Spark-SQL/Streaming for faster processing of data.
Worked on Exploring with the Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Apache Kafka, Data Frame, Pair RDD's, Spark YARN.
Created a spark SQL for joining three hive tables and write them to a hive table and stored them on to S3.
Setting up Sqoop for batch processing, having various data sources, data targets and data formats
Started using apache NiFi to copy the data from local file system to HDFS .
Worked on Spark SQL for joining multi hive tables and write them to a final hive table and stored them on S3.
Used Python to extract weekly information from XML files.
Effectively migrated data from different source systems to build a secure data warehouse
Did automation of the ETL processes using UNIX shell scripting
Performed data cleaning and pre- processing using multiple Map Reduce jobs in PIG and Hive
Expertise in importing and exporting streaming data into HDFS using stream processing platforms like flume and Kafka
Understanding in Zookeeper configuration as to provide cluster coordination services
Experience in Launching EC2 instances in Amazon EMR using Console
Created data-models for Client’s transactional logs, analyzed the data from Casandra tables from quick searching, sorting and grouping using the Cassandra Query Language (CQL)
Created UDFs to calculate the pending payment for the give customer data based on last day of every month and used in Hive Script
Involved in working with integrate tools like Apache Kafka,Elastic search with existing source systems
Involved in loading data from Linux, Apache Nifi file systems, servers, java web services using Kafka producers and consumers. created new variables in Splunk and assigned new regular expressions to those created variables to differentiate with existing.
Used Kerberos and integrated it to Hadoop cluster to make it more strong and secure from unauthorized access
Experience in writing Shell Scripts to run the jobs in parallel and increase the performance
Experience in Query data using Spark SQL on the top of Spark Engine implementing Spark RDD’s in Python
Used different file formats like Text files, Sequence Files, Avro, Optimized Row Columnar (ORC)

Environment: Hadoop, Hive, Linux, Spark SQL, Mongo DB, Spark, Scala,Hibernate, agile methodology, MAPR, Cassandra Query Language, Oozie, Kafka, Flume, Nifi, Pig, Kerberos, Zookeeper, Python, HDFS.

Confidential, Carrolton, TX

Hadoop/Spark Developer

Responsibilities:

Involved in writing Unix/Linux Shell Scripting for scheduling jobs and for writing PIG scripts and Hive QL.
Involved in Database design and developing SQL Queries, stored procedures on MySQL
Setting up Hadoop MapReduce, big data,HDFS and developed multiple MapReduce jobs in Java for data cleansing and preprocessing.
Performed optimization tasks like using distributed cache for small datasets, partition and bucketing in hive, doing map side joins etc.
Used Hive and Impala to query the data in HBase
Created Hive tables and involved in data loading and writing Hive UDFs. Developed Hive UDFs for rating aggregation
Developed numerous MapReduce jobs in Scala for Data Cleansing and Analyzing Data in Impala.
Experience in Creating data pipeline for different events of ingestion, aggregation and load consumer response data in AWS S3 bucket into Hive external tables in HDFS location to serve as feed for tableau dashboards.
Used EMR (Elastic Map Reducing) to perform bigdata operations in AWS
Developed UI application using AngularJS, integrated with Elastic Search to consume REST.
Cluster coordination services through Zookeeper
Used Sqoop to extract data from Oracle SQL server and MySQL databases to HDFS
Data access framework by Spring is used for automatically acquiring and releasing database resources and exception handling by spring data access hierarchy for better handling of database connections with JDBC.
Worked with team of Developers and Testers to resolve the issues with the server timeouts and database connection pooling issues.
Implemented several JUnit test case
Did web logging application for better trace the data flow on application server using Log4J
Worked with team of Developers and Testers to resolve the issues with the server timeouts and database connection pooling issues.
Responded to requests from Technical Team members to prepare a TAR and configured files for Production migration.

Environment: Linux, Pig, Hive QL, MySQL, Map R, Scala, HDFS, Impala, AWS Tools (EMR, EC2, S3), Zookeeper, Sqoop, Junit, Log4j, EMR.

Confidential

JAVA/Hadoop/Spark Developer

Responsibilities:

Involved in Requirements Analysis and design an Object-oriented domain model
Implemented test scripts to support test driven development and continuous integration
Experience in Importing and exporting data into big data,HDFS and Hive using Sqoop
Developed Map-Reduce programs to clean and aggregate the data
Worked in complete SDLC phase like Requirements, Specification, Design, Implementation and Testing
Developed Spring and Hibernate data layer components for application
Developed profile view web pages add, edit using HTML, CSS, JQuery, Java Script
Developed the application by using MAVEN script
Developed the mechanism for logging and debugging with Log4j
Involved in developing database tractions through JDBC
Used GIT for version control
Responsible for troubleshooting issues in the execution of MapReduce jobs by inspecting and reviewing log files.
Used oracle as Database and used load for queries execution and involved in writing SQL scripts, PL/SQL code for procedures and functions
Developed Front-end applications which will interact the mainframe applications using J2C connectors
Hands on experience in exporting the results into relational databases using Sqoop for visualization and to generate reports for BI team
Designing, Development and implementation of JSPs in presentation layer for submission, Application, reference implementation
Deployed Web, presentation and business components on Apache Tomcat Application Server.
Involvement in post-production support, Testing and used JUNIT for unit testing of the module
Worked in Agile methodology

Environment: HDFS, Hive, Sqoop, Java, Core Java, Maven, HTML, CSS, Java Script, GIT, Map R, JUNIT, Agile, Log4j, SQL, Agile.

We provide IT Staff Augmentation Services!

Hadoop/big Data Developer Resume

North Haven, ConnecticuT

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship