Big Data Developer Resume New York, NY - Hire IT People

SUMMARY

6 years of experience in software development and building applications in Information Technologies.
More than 5 years hands on experience in Big data development tools and Data Analytics.
Extensive hands on experience in most of the programming languages Java, Python, R, Scala and SAS.
Hands on experience in building efficient MapReduce programs on Apache Hadoop for performing jobs and analysing big data.
Experience in Apache Spark with Scala, Python and Java.
Experience in NoSQL database which is a part of Research and Implement ongoing project with requirements.
Intensive hands on experience with Sqoop to import data from Microsoft SQL to HDFS and vice - versa.
In depth experience in BigData development tools HIVE, Sqoop, HBase, PIG, Storm, Oozie and flume.
Experience developing PySpark code to create RDDs, Paired RDDs and DataFrames, and using Spark SQL and Streaming for fast processing of data.
Experience as a Data Engineer can perform data extractions, transformations and loading.
Experienced in working with Data Visualization using tools like Tableau, Plotly and QlikView.
Hands on experience in Visualizing with Python, R and SAS using different libraries MatplotLib, plotly.
Understanding and experience with most of the Machine Learning Algorithms and implementations.
Good Knowledge in NoSQL Databases and can perform different transctions.
Built custom ETL to Extract Transform and Load the data from MsSQL to HDFS.
Experience in building applications using Python and Java.
Experience in building and understanding Natural Language Processing Programs
Practical Experience and Knowledge in Data Mining and Text analytics using Python and R.
Worked on connecting programming languages to SQL applications for developments and deployment.
Hands on experience in using Cloudera Manager and Hortonworks Distribution to monitor and manage clusters. participated in Fraud Detection, Anti Money Laundering and Know Your Customer Projects.
Experience in ETL and Querying in Big Data tools like Pig Latin and Hive QL
Knowledge in Big Data Machine Learning toolkits such as Mahout and Spark ML
Good hand on experience in Handling different data sets like JSON and XML
Good Knowledge and experience with RESTful API’s.
Familiar with AWS Components like EC2, S3 and also created instances in different cloud platforms Azure and GCP.
Hands on experience with MarkLogic NoSQL and also worked with MongoDB and Cassandra.
Experience in job workflow scheduling and monitoring tools like Oozie and Zookeeper.
Experience in Data Extraction programs like crawling and scrapping using Python.
Knowledge and understanding of Big Data ML toolkits, such as Mahout and Spark ML.
Knowledge in different NoSQL databases and performing BA with visualization tools.
In depth knowledge of Job Tracker, Task Tracker, NameNode, Data Nodes and MapReduce concepts.
Developed projects using software development environments like Netbeans, IntelliJ Ide and Eclipse.

TECHNICAL SKILLS

Programming \ NoSql: Java 1.5+, Python 2.7, R, SAS, Scala 2.10\ MongoDB 3.4, Marklogic 8, Cassandra\

Big Data\ Other Skills: Hadoop 2.7.4, MapReduce, Spark 2.2, Sqoop \ Major Machine Learning Algorithms, RESTful \1.4.5, Pig, Hive, Hbase, HDFS\ Webservices, Xquery, XSLT, Apache Tomcat\

Database\ Visualization/Analytical Tools: Microsoft SQL, MySql, Oracle\ Tableau 9, 10\

Operating Systems\ Scripting: Linux, Windows\ Shell Scripting\

Version control: Git, SVN\

PROFESSIONAL EXPERIENCE

Confidential - New York, NY

Big Data Developer

Responsibilities:

Responsible for building Spark application which extracts different forms of raw data and transforms into understandable and used for reporting purpose.
Scala code for different pipelines, raw data transformations and store in Hive tables.
Write Java code to parse the unstructured data from files which are provided by different banks.
Build a database in Oracle server and insert these parsed data into different tables as required.
Join tables based on transformation rules provided by the users and insert into landing tables.
Build Hadoop pipelines to extract, load, transform and report the data.
Write Spark code using Scala as primary programming language to perform critical calculations.
Create a Hive tables in Hadoop environment like Oracle database’s schema and land data from the Spark transformation.
Used Parquet file format to store the data in Hive database.
As these spark application use some of the tables from Oracle, we built Spark application to read those files into Hadoop and store them for further use.
Migrating the whole oracle code into Spark using scala to run parallely in Hadoop.
Tableau Dashboards to showcase the processed data and treasury data in a structured table.
Also participated in building stress reports which internally uses Monte Carlo Simulation.
Wrote complicated queries in Hive/Impala to get data from landing and transformation tables.

Environment: Hadoop 2.4, Apache Spark 2.2, Oracle, Scala 2.11, Java 1.8, Hive 2.0, Impala, Tableau 9, MapReduce, Yarn, HDFS, Shell Scripting, Windows, Linux

Confidential

Data Engineer and Analyst

Responsibilities:

Part of building a Hadoop cluster for handling huge datasets every day with 10 nodes.
Built an application to read and write 50 million records per day (~1Tb/day) into HDFS.
Wrote Sqoop commands to extract previous data from MsSQL to HDFS.
Inserted data from HDFS into Hive tables for storing data efficiently by inserting required columns.
Built a MapReduce application in Java to mine users details from the daily and quarterly data.
Automated the daily data mining process of user’s details from transaction details
Implemented different functionalities in MapReduce program for different datasets.
Used Java to build MapReduce program and used Aho-Corasick algorithm for string search.
Built a shell script to run Sqoop and Map Reduce jobs daily in order.
Write Sqoop Commands to load output from HDFS to MsSQL after jobs are done.
Implemented code to connect SQL servers from mapper to upload data into RDBMS directly using JDBC connectors.
Built a custom API for pattern search on Aho-Corasick package for performing regular expressions in Spark using Scala.
Included Batch commands in Windows serve to run shell scripts in Hadoop environment using Putty.
Participated in building Hadoop environment with the available resources and required tools.
Upgraded the mapreduce program into Spark for data transformations and performing actions.
Mined data is divided into classifications and criteria based on the accuracy.
Participated in building web interface in Java for updating Reference list and download it.
Used Tableau to showcase the user’s density state wise.

Environment: Hadoop 2.7, MapReduce v2, Yarn 2, Apache Spark 2.2, MsSQL, Sqoop 1.4.5, HDFS, Java 1.8, Scala 2.11, Shell Scripting, Linux

Confidential - Roseland, NJ

Big Data Developer

Responsibilities:

Responsible for landing multi-source data to HDFS using Spark streaming.
Participated in converting existing Hadoop jobs to spark jobs using SparkCore, Spark SQL, Spark Streaming, MLib and GraphX.
Identify the patterns using Text mining and text analytics in PySpark and SparkR.
Implement different machine learning algorithms and observe for an outlier.
Used Python to perform data cleaning and make data structured and easily converted into DF.
Built Python and R code to find outliers in required indicators and normalize the data.
Transform unstructured transactions dataset into SAS dataset.
Built messaging system to get data from different sources and produce it using Kafka.
Mine social media content and extract relevant information based on user’s search query.
NLP (Natural Language Processing) on this data using Python and libraries NLTK, CoreNLP.
Also used R for deep learning the data and look for any patterns.
Participated in implementing program to find different stop words and ignore them.
Implemented Principal Component Analysis(PCA) and perform Dimensionality Reduction.
Developed Artificial Intelligence application for Image Recognition.
For Image Recognition used Microsoft Cognitive toolkit and Convolution Neural Network.
Stored the data in MongoDB NoSQL and performed different transactions.

Environment: Hadoop 2.4, Apache Spark 2.2, MLlib, Spark Streaming, Python 2.7, R, MongoDB 3.4, PySpark, SparkR, MLlib, SAS, Marklogic, Tableau 9

Confidential - New York, NY

Data Engineer and Analyst

Responsibilities:

Built complex MapReduce Program to run jobs on this large transactions data using Java.
Used Hadoop to store the large sets of unstructured data in HDFS
Handled to import data into HDFS from different databases using Sqoop.
Transform the data using Hive and Mapreduce and load into HDFS.
Performed real time streaming of data using Spark with Kafka
Create Tableau Dashboards from the finance dataset.
Implement R scripts in Tableau for creating calculated parameters.
Build a small Hadoop Cluster for testing the performance of big data with 4 nodes.
Wrote Pig scripts and HiveQL queries and perform analytics on it.
Crawled and scrapped the data from different websites and stored the data in database using Python
Built different visualizations to showcase the analysis did in Python using MatplotLib, Scikit learn.

Environment: Hadoop 2.4.5, HDFS, MapReduce, Yarn, Java 1.8, Sqoop 1.4.4, Apache Spark, Kafka, Oozie, Windows, Linux (Ubuntu), SQL, Tableau, PIG, Hive, Python 2.7

Confidential

Big Data Analyst and Developer

Responsibilities:

Responsible for landing multi-source data to HDFS using Spark streaming .
Participated in converting existing Hadoop jobs to spark jobs using SparkCore, Spark SQL, Spark Streaming, MLib and GraphX .
Identify the patterns using Text mining and text analytics in PySpark and SparkR .
Implement different machine learning algorithms and observe for an outlier.
Used Python to perform data cleaning and make data structured and easily converted into DF.
Built Python and R code to find outliers in required indicators and normalize the data.
Transform unstructured transactions dataset into SAS dataset.
Built messaging system to get data from different sources and produce it using Kafka.
Mine social media content and extract relevant information based on user’s search query.
NLP (Natural Language Processing) on this data using Python and libraries NLTK, CoreNLP .
Also used R for deep learning the data and look for any patterns.
Participated in implementing program to find different stop words and ignore them.
Implemented Principal Component Analysis(PCA) and perform Dimensionality Reduction.
Developed Artificial Intelligence application for Image Recognition.
For Image Recognition used Microsoft Cognitive toolkit and Convolution Neural Network.
Stored the data in MongoDB NoSQL and performed different transactions.

Environment: Hadoop 2.4, Apache Spark 2.2, MLlib, Spark Streaming, Python 2.7, R, MongoDB 3.4, PySpark, SparkR, MLlib, SAS, Marklogic, Tableau 9

We provide IT Staff Augmentation Services!

Big Data Developer Resume

New York, NY

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship