We provide IT Staff Augmentation Services!

Big Data Developer Resume

0/5 (Submit Your Rating)

New York City New, YorK

SUMMARY

  • Overall 8+ years of IT experience across Java, SQL, ETL, Big Data. Interested and passionate about working in Big Data environment.
  • 4+ years of experience in Big data, Hadoop, No SQL technologies in various fields like Insurance, Finance, Health Care.
  • Vast knowledge on the Hadoop Architecture and functioning of various components such as HDFS, Name Node, Data Node, Job Tracker, Task Tracker, Map reduce, Spark.
  • Extensive of experience in providing solutions for Big Data using Hadoop 2.x, HDFS, MR2, YARN, Kafka, Pig, Hive, Sqoop, HBase, Cloudera Manager, Hortonworks, Zookeeper, Oozie, Hue.
  • Experience in importing and exporting data using Sqoop from HDFS/Hive/HBase to Relational Database Systems and vice - versa. Skilled in Data migration and data generation in Big data ecosystem.
  • Experienced in building highly scalable Big-data solutions using Hadoop and multiple distributions i.e., Cloudera, Hortonworks and NoSQL platforms (Hbase).
  • Implementation of Big data batch processes using Hadoop, Map Reduce, YARN, Pig and Hive.
  • Experience in importing and exporting data using Sqoop from HDFS/Hive/HBase to Relational Database Systems and vice-versa.
  • Hands on experience in in-memory data processing with Apache Spark using Scala and python codes.
  • Worked with Spark on an EMR cluster along with other Hadoop applications, and it can also leverage the EMR file system (EMRFS) to directly access data in Amazon S3.
  • Responsible for interaction with the clients for understanding their business problem related to BigData, Cloud Computing and NoSQL technologies.
  • Experienced in using Kafka as a distributed publisher-subscriber messaging system.
  • Good experience in writing Pig scripts and Hive Queries for processing and analyzing large volumes of data.
  • DevelopedSparkscripts by using Scala shell commands as per the requirement
  • Experience in optimization of MapReduce algorithm using Combiners and Partitioners to deliver best results.
  • Extending Hive and Pig core functionality by writing custom UDFs.
  • Good knowledge on Amazon AWS concepts like EMR & EC2 web services which provides fast and efficient processing of Big Data.
  • Experience in understanding the security requirements for Hadoop and integrate with Kerberos authentication and authorization infrastructure.
  • Experienced in Strong scripting skills in Python and Unix shell.
  • Experience in managing and reviewing Hadoop log files.
  • Hands on experience in application development using RDBMS and Linux shell scripting.Having good working experience in Agile/Scrum methodologies, technical discussion with client
  • Communication using scrum calls daily for project analysis specs and development aspects.
  • Ability to work independently as well as in a team and able to effectively communicate with customers, peers and management at all levels in and outside the organization.

TECHNICAL SKILLS

Big Data/Hadoop ecosystem: HDFS, MapReduce, YARN, Apache NiFi, Hive, Pig, HBase, Impala, Zookeeper, Sqoop, Flume, Oozie, Spark. Apache Phoenix, Zeppelin, EMR.

Programming Languages: C, C++, Java, Scala, SQL, PL/SQL, Python, Linux shell scripts.

Methodologies: Agile, Scrum, Waterfall

NoSQL Database: Hbase, Cassandra, MongoDB

Database: Oracle 10g, Teradata, DB2, MS Azure.

Tools: Used: Eclipse, IntelliJ, GIT, Putty, WinSCP

Operating systems: Windows, Unix, Linux, Ubuntu

PROFESSIONAL EXPERIENCE

Confidential, New York City, New York

Big Data Developer

Responsibilities:

  • Working in agile, successfully completed stories related to ingestion, transformation and publication of data on time.
  • Perform validations and consolidations for the imported data, Data Migration and Data Generation.
  • Ingested data sets from different DBs and Servers using Sqoop Import tool and MFT (Managed file transfer) Inbound process with elastic search.
  • Expert in implementing advanced procedures like text analytics and processing using the in-memory computing capabilities like ApacheSparkwritten in Scala.
  • Developed Spark scripts by using Scala shell commands as per the requirement.
  • Using Spark streaming consumes topics from distributed messaging source Kafka and periodically pushes batch of data to Spark for real time processing in elastic search.
  • Involved in teams to analyze the Anomaly detection and ratings of the data.
  • Developed Complex HiveQL‘s using SerDe JSON.
  • Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, Scala.
  • Involved in importing the real time data using elastic search to Hadoop using Kafka and implemented the Oozie job for daily imports
  • Wrote Pig Latin Scripts and Hive Queries using Avro schemas to transform the Data sets in HDFS.
  • As part of support, responsible for troubleshooting of Map Reduce Jobs, Pig Jobs, Hive
  • Worked on performance tuning of Hive & Pig Jobs.
  • Implemented Spark using Scala and Spark SQL for faster testing and processing of data in Amazon EMR.

Environment: Hadoop, Cloudera, Map Reduce, Spark, Shark, Hive, Apache NiFi, Pig, Sqoop, Shell Scripting, Storm, Kafka, Data Meer, Oracle, Teradata, SAS, Arcadia, Java 7.0, Nagios, Spring, JIRA, EMR.

Confidential, Tampa, Florida

Hadoop and Spark Developer

Responsibilities:

  • Installed and configured Hadoop MapReduce, HDFS, developed multiple MapReduce jobs in Java and Nifi for data cleaning and preprocessing.
  • Imported and exported data into HDFS from Oracle database and vice versa using Sqoop.
  • Installed and configured Hadoop Cluster for major Hadoop distributions.
  • Used Hive and Pig as an ETL tool for event joins, filters, transformations and pre-aggregations.
  • Created partitions, bucketing across state in Hive to handle structured data using Elastic search.
  • Developed workflow in Oozie to orchestrate a series of Pig scripts to cleanse data such as removing personal information or merging many small files into a handful of very large, compressed files using Pig pipelines in the data preparation stage.
  • Involved in moving all log files generated from various sources to HDFS for further processing through Elastic search, Kafka, Flume & and process the files by using Piggybank.
  • Extensively used PIG to communicate with Hive using HCatalog and HBase using Handlers.
  • Used SparkSQL for Scala &, Python interface that automatically converts RDD case classes to schema RDD.
  • Used SparkSQL to read and write table which are stored in Hive and Amazon EMR.
  • Performed Sqooping for various file transfers through the HBase tables for processing of data to several NoSQL DBs- Cassandra, MangoDB.
  • Created tables, secondary indices, join indices viewed in Teradata development environment for testing.
  • Captured data logs from web server and Elastic search into HDFS using Flume for analysis.
  • Managed and reviewed Hadoop log files.

Environment: Hive, Pig, MapReduce, Apache Nifi, Sqoop, Oozie, Flume, Kafka, EMR, Storm, HBase, Unix, Linux, Python, SQL, Hadoop 1.x, HDFS, GitHub, Talend, Python Scripting.

Confidential, Dallas, Texas

Hadoop developer

Responsibilities:

  • As a Big Data Developer implemented solutions for ingesting data from various sources and processing the Data-at-Rest utilising Big Data technologies such as Hadoop, MapReduce Frameworks, HBase, Hive, Oozie, Flume, Sqoop, elastic search etc.
  • Designed and Implemented real-time Big Data processing to enable real-time analytics, event detection and notification for Data-in-Motion.
  • Hands-on experience with Confidential Big Data product offerings such as Confidential Info Sphere Big Insights, Confidential Info Sphere Streams, Confidential BigSQL.
  • Load and transform large sets of structured, semi-structured using Hive and Impala with elastic search.
  • Worked on Assert Tracking project where we use to collect real-time vehicle location data using Confidential streams from JMS queue and processed that data in Vehicle Tracking using ESRI - GIS Mapping Software, Scala and Akka Actor Model.
  • Experienced in Developing Hive queries in BigSQL Client for various use cases.
  • Involved in developing few Shell Scripts and automated them using CRON job scheduler

Environment: Hadoop 1x, Hive 0.10, Pig 0.11, Sqoop, HBase, UNIX Shell Scripting, Scala, Akka, Confidential InfoSphere BigInsights, Confidential InfoSphere Streams, Confidential BigSQL, Java.

Confidential - Chicago, IL

Hadoop/Spark Consultant

Responsibilities:

  • Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, Spark SQL and Scala, extracted large datasets from Data Lakes, Cassandra and Oracle servers into HDFS and vice versa using Sqoop.
  • Experience in managing AWS Hadoop clusters and services using HortonWorks Manager
  • Explore with the Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD's, Spark YARN.
  • Worked with different file formats such as Text, Sequence files, Avro, ORC and Parquet.
  • Develop Spark code using Scala and Spark-SQL/Streaming for faster testing and processing of data.
  • Import the data from different sources like HDFS/HBase into Spark RDD.
  • Experience over Kafka and Storm are used for real time analytics and AML, which used for data analytics.

Environment: Hadoop, Hortonworks, MapReduce, Spark, Shark, Hive, Apache NiFi, Pig, Sqoop, Shell Scripting, Storm, Kafka, Data Meer, Oracle, Teradata, SAS, Arcadia, Java 7.0, Nagios, Spring, JIRA.

Confidential - Sunnyvale, CA

Hadoop Developer

Responsibilities:

  • Created Data Lakes and data pipeline for different events of mobile applications, to filter and load consumer response data from urban-airship in AWS S3 bucket into Hive external tables in HDFS location.Good experience on Apache Nifi Ecosystem.
  • Worked with different file formats like Json, AVRO and parquet and compression techniques like snappy.Nifi ecosystem Is used.
  • Datawarehousing tools like talend and teradata are used.
  • Developed impala scripts for end user / analyst requirements for adhoc analysis
  • Used various Hive optimization techniques like partitioning, bucketing and Mapjoin.
  • Developed python code for different tasks, dependencies, SLA watcher and time sensor for each job for workflow management and automation using Airflow tool.
  • Developed shell scripts for dynamic partitions adding to hive stage table, verifying Json schema change of source files, and verifying duplicate files in source location.
  • Developed UDF's in spark to capture values of a key-value pair in encoded Json string.
  • Developed spark application for filtering Json source data in AWS S3 location and store it into HDFs with partitions and used spark to extract schema of Json files.

Environment: Hive, Apache Nifi, Spark, AWS S3, EMR, Cloudera, Jenkins, Shell scripting, Hbase, Airflow, Intellij IDEA, Sqoop, Impala.

Confidential

Junior Java Developer/ Software Intern (Hadoop & SQL)

Responsibilities:

  • Developed the presentation layer using JSP, HTML, CSS and client validations using JavaScript.
  • Involved in designing and development of the E-Commerce site using JSP, Servlet, EJBs, JavaScript and JDBC.
  • Used Eclipse 6.0 as IDE for application development.
  • Used JDBC to connect the web applications to Databases.
  • Developed and utilised J2EE Services and JMS components for messaging communication in Web Logic.
  • Worked on analyzing Hadoop cluster using different big data analytic tools including Pig, Hive and Map Reduce.
  • Developing parser and loader map reduce application to retrieve data from HDFS and store to HBase and Hive.
  • Importing the data from the MySql and Oracle Databases into the HDFS using Sqoop.
  • Importing the unstructured data into the HDFS using Flume.
  • Written Map Reduce java programs to analyze the log data for large-scale data sets.
  • Involved in creating Hive(HCatalog) tables, loading and analyzing data using hive queries.
  • Worked hands on with ETL process and Involved in the development of the Hive scripts for extraction, transformation and loading of data into other data warehouses.
  • Used HIVE join queries to join multiple tables of a source system and load them into Elastic Search Tables.

We'd love your feedback!