We provide IT Staff Augmentation Services!

Big Data Developer Resume

5.00/5 (Submit Your Rating)

Jacksonville, FL

SUMMARY

  • Above 7years of professional experience which includes Analysis, Design, Development, Integration, Deployment and Maintenance of quality software applications using Java/J2EE Technologies and Big data Hadoop technologies.
  • Above4 Years of working experience in data analysis and data mining using Big Data Stack.
  • Proficiency in Java, Hadoop Map Reduce, Pig, Hive, Oozie, Sqoop, Flume, Hbase,Scala, Spark, Kafka, Storm, Impala and NoSQL Databases.
  • High Exposure on BigData technologies and Hadoop ecosystem, In - depth understanding of Map Reduce and the Hadoop Infrastructure.
  • Excellent noledge on Hadoop Architecture and ecosystems such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node, YARN and Map Reduce programming paradigm.
  • Good exposure on usage of NoSQLdatabases column-oriented HBase and Cassandra.
  • Extensive experienced in working with semi/unstructured data by implementing complex map reduce programs using design patterns.
  • Extensive experience writing custom Map Reduce programs for data processing and UDFs for both Hive and Pig in Java.
  • Strong experience in analyzing large amounts of data sets writing Pig scripts and Hive queries.
  • Extensive experienced in working with structured data using Hive QL, join operations, writing custom UDF’s and experienced in optimizing Hive Queries.
  • Experience in importing and exporting data using Sqoop from HDFS to Relational Database.
  • Experienced in job workflow scheduling and monitoring tools like Oozie.
  • Experience in Apache Flume for collecting, aggregating and moving huge chunks of data from various sources such as webserver, telnet sourcesetc.
  • Hands on experience in major Big Data components Apache Kafka, Apache spark, Zookeeper, Avro.
  • Experienced in implementing unified data platforms using Kafka producers/ consumers, implement pre-processing using storm topologies.
  • Experienced in migrating map reduce programs into Spark RDD transformations, actions to improve performance.
  • Experience with using Big Data with ETL (Talend).
  • Experience with ETL - Extract Transform and Load - Talend Open Studio,Informatica.
  • Strong experience in architecting real time streaming applications and batch style large scale distributed computing applications using tools like Spark Streaming, Spark SQL, Kafka, Flume, Map reduce, Hive etc.
  • Experience using various Hadoop Distributions (Cloudera, Hortonworks, MapR etc) to fully implement and leverage new Hadoop features
  • Worked on custom Pig Loaders and Storage classes to work with a variety of data formats such as JSON, Compressed CSV, etc.,
  • Good Knowledge in Amazon AWS concepts like EMR and EC2webservices which provides fast and efficient processing of Big Data.
  • Proficient in Java, J2ee, JDBC, Collections, Servlets, JSP, spring, Hibernate, JSON, XML, REST, SOAP Web services, and Eclipse Link.
  • Extensive experienced in working with SOA based architectures using Rest, SOAP based services using JAX-RS and JAX-WS.
  • Experienced in working with different scripting technologies like Python, Unix shell scripts.
  • Experience on Source control repositories like SVN, CVS and GIT.
  • Strong experienced in working with UNIX/LINUX environments, writing shell scripts.
  • Skilled at build/deploy multi module applications using Maven, Ant and servers like Jenkins.
  • Adequate noledge and working experience in Agile & Waterfall methodologies.
  • Experienced in requirement analysis, application development, application migration and maintenance using Software Development Lifecycle (SDLC) and Java/J2EE technologies.Proficient in producing/consuming Rest based/ SOAP based web services using JAX-RS and JAX-WS.
  • Experience in writing database objects like Stored Procedures, Functions, Triggers, PL/SQL packages and Cursors for Oracle, SQL Server, MySQL & Sybase databases.
  • Developing and Maintenance the Web Applications using the Web server Tomcat.
  • Expertise in Web pages development using JSP, HTML, Java Script, JQuery and Ajax.
  • Excellent problem solving, and analytical skills.

TECHNICAL SKILLS

Hadoop/Big Data Technologies: HDFS, Map Reduce, Sqoop, Flume, Pig, Hive, Oozie, impala, Apache Nifi, Zookeeper and Cloudera Manager.

NoSQL Database: MongoDB, Cassandra

Real Time/Stream processing: Apache Storm, Apache Spark

Distributed message broker: Apache Kafka

Monitoring and Reporting: Tableau, Custom shell scripts

Hadoop Distribution: Horton Works, Cloudera, MapR

Build Tools: Maven, SQL Developer

Programming & Scripting: JAVA, C, SQL, Shell Scripting, Python

Java Technologies: Servlets, JavaBeans, JDBC, Spring, Hibernate, SOAP/Rest services

Databases: Oracle, MY SQL, MS SQL server, Teradata

Web Dev. Technologies: HTML, XML, JSON, CSS, JQUERY, JavaScript, angular JS

Tools: & Utilities: Eclipse, Net Beans, SVN, CVS, SOAP UI,MQ explorer, RFH util, JMX explorer, SSRS, Aqua Data Studio, XML Spy, ETL(talend)

Operating Systems: Linux, Unix, Mac OS-X, Windows 8, Windows 7, Windows Server 2008/2003

PROFESSIONAL EXPERIENCE

Confidential, Jacksonville, FL

Big Data Developer

Responsibilities:

  • Collected and aggregated large amounts of data from different sources such as COSMA ( Confidential Onboard System Management Agent), BOMR (Back Office Message Router),ITCM(Interoperable train control messaging), Onboard mobile and network devices from the PTC(Positive Train Control) network using Apache Nifi and stored the data into HDFS for analysis.
  • Used Apache Nifi for ingestion of data from the IBM MQ’s (Messages Queue).
  • Implemented Nifi flow topologies to perform cleansing operations before moving data into HDFS.
  • Developed Java Map Reduce programs on ITCM log data to transform into structured way.
  • Developed optimal strategies for distributing the ITCM log data over the cluster; importing and exporting the stored log data into HDFS and Hive using Apache Nifi.
  • Developed custom code to read the messages of the IBM MQ and to dump them onto the Nifi Queues.
  • Worked with the Apache Nifi flow to perform the conversion of Raw XML data into JSON, AVRO.
  • Implemented Hive Generic UDF’s to incorporate business logic into Hive Queries.
  • Configuring SparkStreaming to receive real time data from IBM MQ and store the stream data to HDFS.
  • Analyzed the Bandwidth data from the locomotive using the HiveQL to extract the Bandwidth consumed by each locomotive in a day using different carriers AT&T, Verzion or Wifi.
  • Designed and implementedHive queries and functions for evaluation, filtering, loading and storing of data.
  • Installed and configured Tableau Desktop to connect to the Hortonworks Hive Framework (Database)which contains the Bandwidth data form the locomotive through the Hortonworks ODBC connector for further analytics of the data.
  • Collected and provided locomotive communication usage data by locomotive, channel, protocol and by application.
  • Analyzed theLocomotive Communication UsagefromCOSMA to monitor in/out-bound traffic bandwidth by communication channel.
  • Worked on back-end Hive database to provide both Historical and live Bandwidth data from the locomotives to tableau for historical and live reporting.

Environment: Hortonworks Data Platform (HDP), Hortonworks Data Flow(HDF), Hadoop, HDFS, Spark, Hive, MapReduce, Apache Nifi, Tableau Desktop, Linux.

Confidential, Houston, TX

Big Data Systems Engineer

Responsibilities:

  • Installed and configured a three node cluster with Hortonworks Data Platform (HDP 2.3) on the HP infrastructure and Management.
  • Worked with HP Intelligent provisioning and the smart storage array for setting up the disks for the installation.
  • Used a BigData Benchmark tool called BigBench to benchmark the three node cluster.
  • Configured the tool BigBench and had it running on one of the nodes in the cluster.
  • Ran the Benchmark for different Datasets of 5GB, 10GB, 50 GB, 100 GB and 1 TB.
  • Worked withstructured, semi-structured and unstructured data which is automated in the tool BigBench having to run with the workloads usingSpark’s machine learning libraries.
  • Configured aPAT (Performance Analysis Tool) for having the benchmark results dumped into the automated charts using MS-Excel.
  • Used Ambari Server for monitoring the cluster while the benchmark is running.
  • Worked with different teams to install operating system, Hadoop updates, patches, version upgrades of Hortonworks as required.
  • Collected the performance metrics from Hadoop nodes, to analyze the resource utilization and draw automated charts using MS-Excel, a Performance Analysis Tool (PAT) was used.
  • Worked with various performance monitoring tools like top, dstat, atop and also ambari metrics.
  • Collected the results from the different Datasets (5GB, 10GB, 50GB, 100GB and 1TB) tests on the Server and was able to dump them on to the PAT (Performance Analysis Tool) for further analyzing the resource utilization.
  • Had a chance to work withHPE insight CMU (Cluster Management Utility) for managing the cluster and also HPE Vertica for SQL on Hadoop.
  • Worked on configuring the performance tuning parameters used during the benchmark.
  • Used Tableau Desktop for creating visual Dashboards of CPU utilization, Disk IO, Memory, Network IO and Query Times obtained from the PAT(Performance Analysis tool) automated charts using MS-Excel.
  • Had the results obtained from the benchmark output in terms of automated charts being dumped into Tableau Desktop for further data analytics.
  • Installed and configured Tableau Desktop on one of the three nodes to connect to the Hortonworks Hive Framework (Database) through the Hortonworks ODBC connector for further analytics of the cluster.

Environment: Hortonworks Data Platform (HDP), Hadoop, HDFS, Spark, Hive, MapReduce, BigBench, Tableau Desktop, Linux.

Confidential, Monroeville, PA

Sr. Big Data/Hadoop Developer

Responsibilities:

  • Collected and aggregated large amounts of web log data from different sources such as webservers, mobile and network devices using ApacheKafka and stored the data into HDFS for analysis.
  • Implemented Storm builder topologies to perform cleansing operations before moving data into Cassandra.
  • Developed Java Map Reduce programs on log data to transform into structured way.
  • Developed optimal strategies for distributing the web log data over the cluster; importing and exporting the stored web log data into HDFS and Hive using Sqoop.
  • Implemented Hive Generic UDF’s to incorporate business logic into Hive Queries.
  • Configuring SparkStreaming to receive real time data from the Kafka and Store the stream data to HDFS.
  • Worked with Spark to create structured data from the pool of unstructured data received.
  • Converting HiveQueries to SparkSQL and using parquet file as the storage format.
  • Implemented Spark RDD transformations, actions to migrate Map reduce algorithms.
  • Analyzed the web log data using the HiveQL to extract number of unique visitors per day, page views, visit duration, most visited page on website.
  • Integrated Oozie with the rest of the Hadoop stack supporting several types of Hadoop jobs out of the box (such as Map-Reduce, Pig, Hive, and Sqoop) as well as system specific jobs (such as Java programs and shell scripts).
  • Designed and implementedHIVE queries and functions for evaluation, filtering, loading and storing of data.
  • Familiar with ETL (talend) and data integration designed for IT and BI analysts to schedule.
  • Creating Hive tables and working on them using Hive QL.
  • Monitored workload, job performance and capacity planning using Cloudera Manager.
  • Involved in build applications using Maven and integrated with CI servers like Jenkins to build jobs.
  • Worked collaboratively with all levels of business stakeholders to architect, implement and test Big Data based analytical solution from disparate sources.
  • Involved in complete SDLC of project including requirements gathering, design documents, development, testing and production environments.
  • Involved in Agile methodologies, daily scrum meetings, sprint planning.

Environment: Hadoop, HDFS,Map Reduce, Hive, Sqoop, Spark, Scala, Kafka, Oozie, Storm, Cassandra, Maven, Shell Scripting, CDH.

Confidential, Springfield, IL

Big Data/Hadoop Developer

Responsibilities:

  • Involved in creating Hive tables, and loading and analyzing data using hive queries.
  • Developed Simple to complex Map Reduce Jobs using Hive and Pig.
  • Developed workflow in Oozie to automate the tasks of loading the data into HDFS and pre-processing with Pig.
  • Migrated complex map reduce programs into in memory Spark processing using Transformations and actions.
  • Mentored analyst and test team for writing Hive Queries.
  • Developed multiple Map Reduce jobs in java for data cleaning and preprocessing
  • Worked with Linux systems and RDBMS database on a regular basis in order to ingest data using Sqoop.
  • Used ETL(talend) for Extraction, Transformation and Loading of data from multiple sources.
  • Implemented Kafka consumers to move data from Kafka partitions into Cassandra for near real time analysis.
  • Used Cassandra Query language (CQL) to implement CRUD operations on Cassandra file system.
  • Develop and maintains complex outbound notification applications dat run on custom architectures, using diverse technologies including Core Java, J2EE, SOAP, XML, JMS, JBoss and Web Services.
  • Experienced in running Hadoop streaming jobs to process terabytes of xml format data.
  • Load and transform large sets of structured, semi-structured and unstructureddata.
  • Generated the datasets and loaded toHADOOPEcosystem.
  • Assisted in exporting analyzed data to relational databases using Sqoop.
  • Created and maintained Technical documentation for launching HADOOP Clusters and for executing Hive queries, Pig Scripts, Sqoop jobs.

Environment: Horton works, Hadoop, HDFS, Spark, Oozie, Pig, Hive, MapReduce, Sqoop, Cassandra, Linux.

Confidential, Des Moines, IA

Hadoop Developer

Responsibilities:

  • Developing parser and loader map reduce application to retrieve data from HDFS and store to HBase and Hive.
  • Importing the data from the MySQL into the HDFS using Sqoop.
  • Importing the unstructured data into the HDFS using Flume.
  • Used Oozie to orchestrate the map reduce jobs dat extract the data on a timely manner.
  • Written Map Reduce java programs to analyze the log data for large-scale data sets.
  • Involved in using HBase Java API on Java application.
  • Automated all the jobs for extracting the data from different Data Sources like MySQL to pushing the result set data to Hadoop Distributed File System.
  • Customize parser loader application of Data migration to HBase.
  • Developed Pig Latin scripts to extract the data from the output files to load into HDFS.
  • Developed custom UDFS and implemented Pig scripts.
  • Implemented MapReduce jobs using Java API and PIG Latin as well HIVEQL
  • Participated in the setup and deployment of Hadoop cluster
  • Hands on design and development of an application using Hive (UDF).
  • Responsible for writing Hive Queries for analyzing data in Hive warehouse using Hive Query Language (HQL).
  • Provide support data analysts in running Pig and Hive queries.
  • Involved in HiveQL.
  • Involved in Pig Latin.
  • Importing and exporting Data from MySQL/Oracle to HiveQL Using SQOOP.
  • Importing and exporting Data from MySQL/Oracle to HDFS.
  • Configured HA cluster for both Manual failover and Automatic failover.
  • Designed and built many applications to deal with vast amounts of data flowing through multiple Hadoop clusters, using Pig Latin and Java-based map-reduce.
  • Specifying the cluster size, allocating Resource pool, Distribution of Hadoop by writing the specification texts in JSON File format.
  • Creates a SOLR schema from the Indexer settings
  • Implemented SOLR index cron jobs.
  • Experience in writing SOLR queries for various search documents
  • Responsible for defining the data flow within Hadoop eco system and direct the team in implement them.
  • Exported the result set from Hive to MySQL using Shell scripts.
  • Developed HIVE queries for the analysts.

Environment: Apache Hadoop, Hive, Hue Tool, Zookeeper, Map Reduce, Sqoop, crunch API, Pig 0.10 and 0.11, HCatalog, Unix, Java, JSP, Eclipse, Maven, SQL, HTML, XML, Oracle, SQL Server, MYSQL

We'd love your feedback!