We provide IT Staff Augmentation Services!

Big Data Developer Resume

5.00/5 (Submit Your Rating)

New York, NY

SUMMARY

  • 6 years of experience in software development and building applications in Information Technologies.
  • More than 5 years hands on experience in Big data development tools and Data Analytics.
  • Extensive hands on experience in most of the programming languages Java, Python, R, Scala and SAS.
  • Hands on experience in building efficient MapReduce programs on Apache Hadoop for performing jobs and analysing big data.
  • Experience in Apache Spark with Scala, Python and Java.
  • Experience in NoSQL database which is a part of Research and Implement ongoing project with requirements.
  • Intensive hands on experience with Sqoop to import data from Microsoft SQL to HDFS and vice - versa.
  • In depth experience in BigData development tools HIVE, Sqoop, HBase, PIG, Storm, Oozie and flume.
  • Experience developing PySpark code to create RDDs, Paired RDDs and DataFrames, and using Spark SQL and Streaming for fast processing of data.
  • Experience as a Data Engineer can perform data extractions, transformations and loading.
  • Experienced in working with Data Visualization using tools like Tableau, Plotly and QlikView.
  • Hands on experience in Visualizing with Python, R and SAS using different libraries MatplotLib, plotly.
  • Understanding and experience with most of the Machine Learning Algorithms and implementations.
  • Good Knowledge in NoSQL Databases and can perform different transctions.
  • Built custom ETL to Extract Transform and Load the data from MsSQL to HDFS.
  • Experience in building applications using Python and Java.
  • Experience in building and understanding Natural Language Processing Programs
  • Practical Experience and Knowledge in Data Mining and Text analytics using Python and R.
  • Worked on connecting programming languages to SQL applications for developments and deployment.
  • Hands on experience in using Cloudera Manager and Hortonworks Distribution to monitor and manage clusters. participated in Fraud Detection, Anti Money Laundering and Know Your Customer Projects.
  • Experience in ETL and Querying in Big Data tools like Pig Latin and Hive QL
  • Knowledge in Big Data Machine Learning toolkits such as Mahout and Spark ML
  • Good hand on experience in Handling different data sets like JSON and XML
  • Good Knowledge and experience with RESTful API’s.
  • Familiar with AWS Components like EC2, S3 and also created instances in different cloud platforms Azure and GCP.
  • Hands on experience with MarkLogic NoSQL and also worked with MongoDB and Cassandra.
  • Experience in job workflow scheduling and monitoring tools like Oozie and Zookeeper.
  • Experience in Data Extraction programs like crawling and scrapping using Python.
  • Knowledge and understanding of Big Data ML toolkits, such as Mahout and Spark ML.
  • Knowledge in different NoSQL databases and performing BA with visualization tools.
  • In depth knowledge of Job Tracker, Task Tracker, NameNode, Data Nodes and MapReduce concepts.
  • Developed projects using software development environments like Netbeans, IntelliJ Ide and Eclipse.

TECHNICAL SKILLS

Programming \ NoSql: Java 1.5+, Python 2.7, R, SAS, Scala 2.10\ MongoDB 3.4, Marklogic 8, Cassandra\

Big Data\ Other Skills: Hadoop 2.7.4, MapReduce, Spark 2.2, Sqoop \ Major Machine Learning Algorithms, RESTful \1.4.5, Pig, Hive, Hbase, HDFS\ Webservices, Xquery, XSLT, Apache Tomcat\

Database\ Visualization/Analytical Tools: Microsoft SQL, MySql, Oracle\ Tableau 9, 10\

Operating Systems\ Scripting: Linux, Windows\ Shell Scripting\

Version control: Git, SVN\

PROFESSIONAL EXPERIENCE

Confidential - New York, NY

Big Data Developer

Responsibilities:

  • Responsible for building Spark application which extracts different forms of raw data and transforms into understandable and used for reporting purpose.
  • Scala code for different pipelines, raw data transformations and store in Hive tables.
  • Write Java code to parse the unstructured data from files which are provided by different banks.
  • Build a database in Oracle server and insert these parsed data into different tables as required.
  • Join tables based on transformation rules provided by the users and insert into landing tables.
  • Build Hadoop pipelines to extract, load, transform and report the data.
  • Write Spark code using Scala as primary programming language to perform critical calculations.
  • Create a Hive tables in Hadoop environment like Oracle database’s schema and land data from the Spark transformation.
  • Used Parquet file format to store the data in Hive database.
  • As these spark application use some of the tables from Oracle, we built Spark application to read those files into Hadoop and store them for further use.
  • Migrating the whole oracle code into Spark using scala to run parallely in Hadoop.
  • Tableau Dashboards to showcase the processed data and treasury data in a structured table.
  • Also participated in building stress reports which internally uses Monte Carlo Simulation.
  • Wrote complicated queries in Hive/Impala to get data from landing and transformation tables.

Environment: Hadoop 2.4, Apache Spark 2.2, Oracle, Scala 2.11, Java 1.8, Hive 2.0, Impala, Tableau 9, MapReduce, Yarn, HDFS, Shell Scripting, Windows, Linux

Confidential

Data Engineer and Analyst

Responsibilities:

  • Part of building a Hadoop cluster for handling huge datasets every day with 10 nodes.
  • Built an application to read and write 50 million records per day (~1Tb/day) into HDFS.
  • Wrote Sqoop commands to extract previous data from MsSQL to HDFS.
  • Inserted data from HDFS into Hive tables for storing data efficiently by inserting required columns.
  • Built a MapReduce application in Java to mine users details from the daily and quarterly data.
  • Automated the daily data mining process of user’s details from transaction details
  • Implemented different functionalities in MapReduce program for different datasets.
  • Used Java to build MapReduce program and used Aho-Corasick algorithm for string search.
  • Built a shell script to run Sqoop and Map Reduce jobs daily in order.
  • Write Sqoop Commands to load output from HDFS to MsSQL after jobs are done.
  • Implemented code to connect SQL servers from mapper to upload data into RDBMS directly using JDBC connectors.
  • Built a custom API for pattern search on Aho-Corasick package for performing regular expressions in Spark using Scala.
  • Included Batch commands in Windows serve to run shell scripts in Hadoop environment using Putty.
  • Participated in building Hadoop environment with the available resources and required tools.
  • Upgraded the mapreduce program into Spark for data transformations and performing actions.
  • Mined data is divided into classifications and criteria based on the accuracy.
  • Participated in building web interface in Java for updating Reference list and download it.
  • Used Tableau to showcase the user’s density state wise.

Environment: Hadoop 2.7, MapReduce v2, Yarn 2, Apache Spark 2.2, MsSQL, Sqoop 1.4.5, HDFS, Java 1.8, Scala 2.11, Shell Scripting, Linux

Confidential - Roseland, NJ

Big Data Developer

Responsibilities:

  • Responsible for landing multi-source data to HDFS using Spark streaming.
  • Participated in converting existing Hadoop jobs to spark jobs using SparkCore, Spark SQL, Spark Streaming, MLib and GraphX.
  • Identify the patterns using Text mining and text analytics in PySpark and SparkR.
  • Implement different machine learning algorithms and observe for an outlier.
  • Used Python to perform data cleaning and make data structured and easily converted into DF.
  • Built Python and R code to find outliers in required indicators and normalize the data.
  • Transform unstructured transactions dataset into SAS dataset.
  • Built messaging system to get data from different sources and produce it using Kafka.
  • Mine social media content and extract relevant information based on user’s search query.
  • NLP (Natural Language Processing) on this data using Python and libraries NLTK, CoreNLP.
  • Also used R for deep learning the data and look for any patterns.
  • Participated in implementing program to find different stop words and ignore them.
  • Implemented Principal Component Analysis(PCA) and perform Dimensionality Reduction.
  • Developed Artificial Intelligence application for Image Recognition.
  • For Image Recognition used Microsoft Cognitive toolkit and Convolution Neural Network.
  • Stored the data in MongoDB NoSQL and performed different transactions.

Environment: Hadoop 2.4, Apache Spark 2.2, MLlib, Spark Streaming, Python 2.7, R, MongoDB 3.4, PySpark, SparkR, MLlib, SAS, Marklogic, Tableau 9

Confidential - New York, NY

Data Engineer and Analyst

Responsibilities:

  • Built complex MapReduce Program to run jobs on this large transactions data using Java.
  • Used Hadoop to store the large sets of unstructured data in HDFS
  • Handled to import data into HDFS from different databases using Sqoop.
  • Transform the data using Hive and Mapreduce and load into HDFS.
  • Performed real time streaming of data using Spark with Kafka
  • Create Tableau Dashboards from the finance dataset.
  • Implement R scripts in Tableau for creating calculated parameters.
  • Build a small Hadoop Cluster for testing the performance of big data with 4 nodes.
  • Wrote Pig scripts and HiveQL queries and perform analytics on it.
  • Crawled and scrapped the data from different websites and stored the data in database using Python
  • Built different visualizations to showcase the analysis did in Python using MatplotLib, Scikit learn.

Environment: Hadoop 2.4.5, HDFS, MapReduce, Yarn, Java 1.8, Sqoop 1.4.4, Apache Spark, Kafka, Oozie, Windows, Linux (Ubuntu), SQL, Tableau, PIG, Hive, Python 2.7

Confidential

Big Data Analyst and Developer

Responsibilities:

  • Responsible for landing multi-source data to HDFS using Spark streaming .
  • Participated in converting existing Hadoop jobs to spark jobs using SparkCore, Spark SQL, Spark Streaming, MLib and GraphX .
  • Identify the patterns using Text mining and text analytics in PySpark and SparkR .
  • Implement different machine learning algorithms and observe for an outlier.
  • Used Python to perform data cleaning and make data structured and easily converted into DF.
  • Built Python and R code to find outliers in required indicators and normalize the data.
  • Transform unstructured transactions dataset into SAS dataset.
  • Built messaging system to get data from different sources and produce it using Kafka.
  • Mine social media content and extract relevant information based on user’s search query.
  • NLP (Natural Language Processing) on this data using Python and libraries NLTK, CoreNLP .
  • Also used R for deep learning the data and look for any patterns.
  • Participated in implementing program to find different stop words and ignore them.
  • Implemented Principal Component Analysis(PCA) and perform Dimensionality Reduction.
  • Developed Artificial Intelligence application for Image Recognition.
  • For Image Recognition used Microsoft Cognitive toolkit and Convolution Neural Network.
  • Stored the data in MongoDB NoSQL and performed different transactions.

Environment: Hadoop 2.4, Apache Spark 2.2, MLlib, Spark Streaming, Python 2.7, R, MongoDB 3.4, PySpark, SparkR, MLlib, SAS, Marklogic, Tableau 9

We'd love your feedback!