Big Data Developer Resume
New York, NY
SUMMARY
- 6 years of experience in software development and building applications in Information Technologies.
- More than 5 years hands on experience in Big data development tools and Data Analytics.
- Extensive hands on experience in most of the programming languages Java, Python, R, Scala and SAS.
- Hands on experience in building efficient MapReduce programs on Apache Hadoop for performing jobs and analysing big data.
- Experience in Apache Spark with Scala, Python and Java.
- Experience in NoSQL database which is a part of Research and Implement ongoing project with requirements.
- Intensive hands on experience with Sqoop to import data from Microsoft SQL to HDFS and vice - versa.
- In depth experience in BigData development tools HIVE, Sqoop, HBase, PIG, Storm, Oozie and flume.
- Experience developing PySpark code to create RDDs, Paired RDDs and DataFrames, and using Spark SQL and Streaming for fast processing of data.
- Experience as a Data Engineer can perform data extractions, transformations and loading.
- Experienced in working with Data Visualization using tools like Tableau, Plotly and QlikView.
- Hands on experience in Visualizing with Python, R and SAS using different libraries MatplotLib, plotly.
- Understanding and experience with most of the Machine Learning Algorithms and implementations.
- Good Knowledge in NoSQL Databases and can perform different transctions.
- Built custom ETL to Extract Transform and Load the data from MsSQL to HDFS.
- Experience in building applications using Python and Java.
- Experience in building and understanding Natural Language Processing Programs
- Practical Experience and Knowledge in Data Mining and Text analytics using Python and R.
- Worked on connecting programming languages to SQL applications for developments and deployment.
- Hands on experience in using Cloudera Manager and Hortonworks Distribution to monitor and manage clusters. participated in Fraud Detection, Anti Money Laundering and Know Your Customer Projects.
- Experience in ETL and Querying in Big Data tools like Pig Latin and Hive QL
- Knowledge in Big Data Machine Learning toolkits such as Mahout and Spark ML
- Good hand on experience in Handling different data sets like JSON and XML
- Good Knowledge and experience with RESTful API’s.
- Familiar with AWS Components like EC2, S3 and also created instances in different cloud platforms Azure and GCP.
- Hands on experience with MarkLogic NoSQL and also worked with MongoDB and Cassandra.
- Experience in job workflow scheduling and monitoring tools like Oozie and Zookeeper.
- Experience in Data Extraction programs like crawling and scrapping using Python.
- Knowledge and understanding of Big Data ML toolkits, such as Mahout and Spark ML.
- Knowledge in different NoSQL databases and performing BA with visualization tools.
- In depth knowledge of Job Tracker, Task Tracker, NameNode, Data Nodes and MapReduce concepts.
- Developed projects using software development environments like Netbeans, IntelliJ Ide and Eclipse.
TECHNICAL SKILLS
Programming \ NoSql: Java 1.5+, Python 2.7, R, SAS, Scala 2.10\ MongoDB 3.4, Marklogic 8, Cassandra\
Big Data\ Other Skills: Hadoop 2.7.4, MapReduce, Spark 2.2, Sqoop \ Major Machine Learning Algorithms, RESTful \1.4.5, Pig, Hive, Hbase, HDFS\ Webservices, Xquery, XSLT, Apache Tomcat\
Database\ Visualization/Analytical Tools: Microsoft SQL, MySql, Oracle\ Tableau 9, 10\
Operating Systems\ Scripting: Linux, Windows\ Shell Scripting\
Version control: Git, SVN\
PROFESSIONAL EXPERIENCE
Confidential - New York, NY
Big Data Developer
Responsibilities:
- Responsible for building Spark application which extracts different forms of raw data and transforms into understandable and used for reporting purpose.
- Scala code for different pipelines, raw data transformations and store in Hive tables.
- Write Java code to parse the unstructured data from files which are provided by different banks.
- Build a database in Oracle server and insert these parsed data into different tables as required.
- Join tables based on transformation rules provided by the users and insert into landing tables.
- Build Hadoop pipelines to extract, load, transform and report the data.
- Write Spark code using Scala as primary programming language to perform critical calculations.
- Create a Hive tables in Hadoop environment like Oracle database’s schema and land data from the Spark transformation.
- Used Parquet file format to store the data in Hive database.
- As these spark application use some of the tables from Oracle, we built Spark application to read those files into Hadoop and store them for further use.
- Migrating the whole oracle code into Spark using scala to run parallely in Hadoop.
- Tableau Dashboards to showcase the processed data and treasury data in a structured table.
- Also participated in building stress reports which internally uses Monte Carlo Simulation.
- Wrote complicated queries in Hive/Impala to get data from landing and transformation tables.
Environment: Hadoop 2.4, Apache Spark 2.2, Oracle, Scala 2.11, Java 1.8, Hive 2.0, Impala, Tableau 9, MapReduce, Yarn, HDFS, Shell Scripting, Windows, Linux
Confidential
Data Engineer and Analyst
Responsibilities:
- Part of building a Hadoop cluster for handling huge datasets every day with 10 nodes.
- Built an application to read and write 50 million records per day (~1Tb/day) into HDFS.
- Wrote Sqoop commands to extract previous data from MsSQL to HDFS.
- Inserted data from HDFS into Hive tables for storing data efficiently by inserting required columns.
- Built a MapReduce application in Java to mine users details from the daily and quarterly data.
- Automated the daily data mining process of user’s details from transaction details
- Implemented different functionalities in MapReduce program for different datasets.
- Used Java to build MapReduce program and used Aho-Corasick algorithm for string search.
- Built a shell script to run Sqoop and Map Reduce jobs daily in order.
- Write Sqoop Commands to load output from HDFS to MsSQL after jobs are done.
- Implemented code to connect SQL servers from mapper to upload data into RDBMS directly using JDBC connectors.
- Built a custom API for pattern search on Aho-Corasick package for performing regular expressions in Spark using Scala.
- Included Batch commands in Windows serve to run shell scripts in Hadoop environment using Putty.
- Participated in building Hadoop environment with the available resources and required tools.
- Upgraded the mapreduce program into Spark for data transformations and performing actions.
- Mined data is divided into classifications and criteria based on the accuracy.
- Participated in building web interface in Java for updating Reference list and download it.
- Used Tableau to showcase the user’s density state wise.
Environment: Hadoop 2.7, MapReduce v2, Yarn 2, Apache Spark 2.2, MsSQL, Sqoop 1.4.5, HDFS, Java 1.8, Scala 2.11, Shell Scripting, Linux
Confidential - Roseland, NJ
Big Data Developer
Responsibilities:
- Responsible for landing multi-source data to HDFS using Spark streaming.
- Participated in converting existing Hadoop jobs to spark jobs using SparkCore, Spark SQL, Spark Streaming, MLib and GraphX.
- Identify the patterns using Text mining and text analytics in PySpark and SparkR.
- Implement different machine learning algorithms and observe for an outlier.
- Used Python to perform data cleaning and make data structured and easily converted into DF.
- Built Python and R code to find outliers in required indicators and normalize the data.
- Transform unstructured transactions dataset into SAS dataset.
- Built messaging system to get data from different sources and produce it using Kafka.
- Mine social media content and extract relevant information based on user’s search query.
- NLP (Natural Language Processing) on this data using Python and libraries NLTK, CoreNLP.
- Also used R for deep learning the data and look for any patterns.
- Participated in implementing program to find different stop words and ignore them.
- Implemented Principal Component Analysis(PCA) and perform Dimensionality Reduction.
- Developed Artificial Intelligence application for Image Recognition.
- For Image Recognition used Microsoft Cognitive toolkit and Convolution Neural Network.
- Stored the data in MongoDB NoSQL and performed different transactions.
Environment: Hadoop 2.4, Apache Spark 2.2, MLlib, Spark Streaming, Python 2.7, R, MongoDB 3.4, PySpark, SparkR, MLlib, SAS, Marklogic, Tableau 9
Confidential - New York, NY
Data Engineer and Analyst
Responsibilities:
- Built complex MapReduce Program to run jobs on this large transactions data using Java.
- Used Hadoop to store the large sets of unstructured data in HDFS
- Handled to import data into HDFS from different databases using Sqoop.
- Transform the data using Hive and Mapreduce and load into HDFS.
- Performed real time streaming of data using Spark with Kafka
- Create Tableau Dashboards from the finance dataset.
- Implement R scripts in Tableau for creating calculated parameters.
- Build a small Hadoop Cluster for testing the performance of big data with 4 nodes.
- Wrote Pig scripts and HiveQL queries and perform analytics on it.
- Crawled and scrapped the data from different websites and stored the data in database using Python
- Built different visualizations to showcase the analysis did in Python using MatplotLib, Scikit learn.
Environment: Hadoop 2.4.5, HDFS, MapReduce, Yarn, Java 1.8, Sqoop 1.4.4, Apache Spark, Kafka, Oozie, Windows, Linux (Ubuntu), SQL, Tableau, PIG, Hive, Python 2.7
Confidential
Big Data Analyst and Developer
Responsibilities:
- Responsible for landing multi-source data to HDFS using Spark streaming .
- Participated in converting existing Hadoop jobs to spark jobs using SparkCore, Spark SQL, Spark Streaming, MLib and GraphX .
- Identify the patterns using Text mining and text analytics in PySpark and SparkR .
- Implement different machine learning algorithms and observe for an outlier.
- Used Python to perform data cleaning and make data structured and easily converted into DF.
- Built Python and R code to find outliers in required indicators and normalize the data.
- Transform unstructured transactions dataset into SAS dataset.
- Built messaging system to get data from different sources and produce it using Kafka.
- Mine social media content and extract relevant information based on user’s search query.
- NLP (Natural Language Processing) on this data using Python and libraries NLTK, CoreNLP .
- Also used R for deep learning the data and look for any patterns.
- Participated in implementing program to find different stop words and ignore them.
- Implemented Principal Component Analysis(PCA) and perform Dimensionality Reduction.
- Developed Artificial Intelligence application for Image Recognition.
- For Image Recognition used Microsoft Cognitive toolkit and Convolution Neural Network.
- Stored the data in MongoDB NoSQL and performed different transactions.
Environment: Hadoop 2.4, Apache Spark 2.2, MLlib, Spark Streaming, Python 2.7, R, MongoDB 3.4, PySpark, SparkR, MLlib, SAS, Marklogic, Tableau 9